Plant transcriptional regulators of disease resistance

ABSTRACT

The invention relates to plant transcription factor polypeptides, polynucleotides that encode them, homologs from a variety of plant species, and methods of using the polynucleotides and polypeptides to produce transgenic plants having increased disease resistance or tolerance compared to a control plant. Sequence information related to these polynucleotides and polypeptides can also be used in bioinformatic search methods to identify related sequences and is also disclosed.

FIELD OF THE INVENTION

The present invention relates to compositions and methods for increasingthe tolerance or resistance of a plant to one or more pathogens.

BACKGROUND OF THE INVENTION

In the broadest sense, the definition of plant disease includes anythingthat damages plant health. More commonly, plant disease refers to“biotic disease”, that is, the adverse effects of infectious pathogensthat multiply on or within a plant and have the potential to spread toother plants. Plant pathogen injury may affect any part of a plant, andinclude defoliation, chlorosis, stunting, lesions, loss ofphotosynthesis, distortions, necrosis, and death. All of these symptomsultimately result in yield loss in commercially valuable species.

Plant disease management is a considerable expense in crop productionworldwide. Despite this expenditure, plant diseases significantly reduceworldwide crop productivity. Fungicides, insecticides, andanti-bacterial treatments are expensive, and their application posesboth environmental and health risks.

The use of genetic engineering technologies to enhance the naturalability of plants to tolerate or resist pathogen attack holds greatpotential for enhancing yields while reducing chemical inputs.Manipulation of valuable traits such as disease tolerance or resistancemay be achieved by altering the expression of critical regulatorymolecules that are often conserved between diverse plant species.Related conserved regulatory molecules may be originally discovered in amodel system (for example, in Arabidopsis) and homologous, functionalmolecules then discovered in other plant species. Regulatory moleculesinclude transcription factors—proteins that increase or decrease (induceor repress) the rate of transcription of a particular gene or sets ofgenes. These proteins modulate cellular processes, which results indifferential levels of gene expression at various developmental stages,in different tissues and cell types, and in response to differentexogenous (e.g., environmental) and endogenous stimuli throughout thelife cycle of the organism. Transformed and transgenic plants thatcomprise cells having altered levels of at least one selectedtranscription factor, for example, may possess advantageous or desirabletraits. Strategies for manipulating traits by altering a plant cell'stranscription factor content can therefore result in plants and cropswith new and/or improved commercially valuable properties, includingbroad-spectrum resistance. Although enhanced disease resistance causedby the overexpression of defense gene regulators or signal transductioncomponents has been reported previously (for example, see Cao and Dong(1998) Proc. Natl. Acad. Sci. USA 95: 6531-6653; Century et al. (1997)Science 278: 1963-1965; and Oldroyd and Staskawicz (1998) Proc. Natl.Acad. Sci. USA 95: 10300-10305), expression of these regulatory genesdid not result in broad spectrum resistance to both biotrophic andnecrotrophic pathogens.

The transcription factor G28 (GenBank accession number AB008103; SEQ IDNO: 2) is a downstream component of an ethylene (ET) response pathway(Fujimoto et al. (2000) Plant Cell 12: 393-404) and is a member of afamily of structurally related transcription factors that contain ERF(ethylene response factor) domains that activate target genes containinga so-called ethylene responsive element (ERE; GCC box; Chao et al;(1997) Cell 89: 1133-1144; Ohme-Takagi et al. (1995) Plant Cell 7:173-182; Solano and Ecker et al. (1998) Curr. Opin. Plant Biol. 1,393-398; Solano et al. (1998) Genes Dev. 12: 3703-3714; Stepanova et al.(2000) Curr. Opin. Plant Biol. 3: 353-360). The ERF domain that bindsthe ERE is a novel DNA binding element found only in plants. In additionto G28, the tomato ERF domain containing proteins Pti4, Pti5 and Pti6have been implicated in a defense response pathway that acts downstreamof the tomato resistance gene PTO (Gu et al. (2000) Plant Cell 12:771-786; Jia and Martin (1999) Plant Mol. Biol. 40: 455-465; Thara etal. (1999) Plant J. 20: 475-483; Zhou et al. (1997) EMBO J. 16:3207-3218). Pti4, in particular, is a relatively close homolog of AtERF1and may function similarly to AtERF1. Indeed, recent work has shown thatover-expression of Pti4 in transgenic Arabidopsis plants leads toenhanced resistance to E. orontii, similar to the resistance observed inArabidopsis plants overexpressing G28 (Gu et al. (2002) Plant Cell 14,817-831).

We have identified polynucleotides encoding transcription factors,including G28 and related sequences such as G3430 (SEQ ID NO: 9),paralogs and orthologs, developed numerous transgenic plants using thesepolynucleotides, and analyzed the plants for a disease resistance ortolerance. ID so doing, we have identified important polynucleotide andpolypeptide sequences for producing commercially valuable plants andcrops as well as the methods for making them and using them. Otheraspects and embodiments of the invention are described below and can bederived from the teachings of this disclosure as a whole.

SUMMARY OF THE INVENTION

The present invention pertains to recombinant polynucleotides encodingAP2 transcription factor polypeptides, specifically members of the G28clade of transcription factor polypeptides. The sequences of theinvention include polynucleotides and polypeptides derived from bothdicots and monocots. The polypeptide sequences from monocots alsocontain a subsequence identified as Motif Y (exemplified by SEQ ID NO:55). Sequences of the invention are considered to be those that arerelated to the transcription factor sequences of the invention andrelated sequences, produced artificially or found in plants, including,for example, polypeptide sequences that are substantially identical withthe sequences found in the Sequence Listing, or polynucleotide sequencesthat hybridize over their full length to the polynucleotides in theSequence Listing under stringent conditions. This includes SEQ ID NO: 9,G3430, or the complement of SEQ ID NO: 9. An example of stringentconditions given in this disclosure includes two wash steps of 6×SSC at65° C., each step being 10-30 minutes in duration

The invention also pertains to transgenic monocot plants that containthe recombinant polynucleotide just described (that is, a polynucleotideencoding a member of the G28 clade of transcription factors thatcontains a Motif Y). These transgenic monocot plants have enhancedtolerance to fungal disease due to the expression of the recombinantpolynucleotide. The transgenic monocotyledonous plants of the inventionmay also have increased tolerance or resistance, as compared to acontrol plant, to more than one pathogen. The pathogens may include, forexample, diverse fungal pathogens including Botrytis, Fusarium,Erysiphe, and Sclerotinia.

The invention also pertains to a method for increasing the tolerance orresistance of a monocot plant to a pathogen. This is accomplished byproviding an expression vector comprising:

-   -   (i) a polynucleotide sequence encoding a polypeptide comprising        a Motif Y that is at least 82% identical to the Motif Y of SEQ        ID NO: 55; and    -   (ii) regulatory elements flanking the polynucleotide sequence;        these regulatory elements are able to control expression of said        polynucleotide sequence in a target monocot plant.

The target monocot plant is then transformed with the expression vectorto generate a transformed monocot plant capable of expressing thepolynucleotide sequence. These steps thus increase the tolerance orresistance of the monocot plant to a pathogen, as compared to thetolerance or resistance level of a control plant.

The invention also pertains to a method for reducing yield loss in amonocot plant due to plant disease. The plant diseases may be caused bymore than one type of pathogen, including fungal pathogens such asBotrytis, Fusarium, Erysiphe, and Sclerotinia. Similar to the method forincreasing the tolerance or resistance of a monocot plant to a pathogen,noted above, the method steps include first providing an expressionvector comprising:

-   -   (i) a polynucleotide sequence encoding a polypeptide comprising        a Motif Y that is at least 82% identical to the Motif Y of SEQ        ID NO: 55; and    -   (ii) regulatory elements flanking the polynucleotide sequence.

The target monocot plant is then transformed with the expression vectorto generate a transformed monocot plant capable of expressing thepolynucleotide sequence, and the plant is then grown. These stepsincrease the tolerance or resistance of the monocot plant to at leastone pathogen, as compared to the tolerance or resistance level of acontrol plant that has the same disease and is infected by the samepathogen. This results in a smaller yield loss for the transformedmonocot plant than the loss experienced by the control plant, when thetransformed and non-transformed monocot plants are challenged with thesame disease pathogen or pathogens.

BRIEF DESCRIPTION OF THE SEQUENCE LISTING, TABLES, AND DRAWINGS

The Sequence Listing provides exemplary polynucleotide and polypeptidesequences of the invention. The traits associated with the use of thesequences are included in the Examples.

CD-ROM1 and CD-ROM2 are identical read-only memory computer-readablecompact discs, and contain copies of the Sequence Listing in ASCII textformat. The Sequence Listing is named “MBI0052PCT.ST25.txt” and is 97kilobytes in size. The copies of the Sequence Listing on the CD-ROMdiscs are hereby incorporated by reference in their entirety.

FIG. 1 shows a conservative estimate of phylogenetic relationships amongthe orders of flowering plants (modified from Angiosperm Phylogeny Group(1998) Ann. Missouri Bot. Gard. 84: 1-49). Those plants with a singlecotyledon (monocots) are a monophyletic lade nested within at least twomajor lineages of dicots; the eudicots are further divided into rosidsand asterids.

Arabidopsis is a rosid eudicot classified within the order Brassicales;rice is a member of the monocot order Poales. FIG. 1 was adapted fromDaly et al. (2001) Plant Physiol. 127: 1328-1333.

FIG. 2 shows a phylogenic dendogram depicting phylogenetic relationshipsof higher plant taxa, including clades containing tomato andArabidopsis; adapted from Ku et al. (2000) Proc. Natl. Acad. Sci. USA97: 9121-9126; and Chase et al. (1993) Ann. Missouri Bot. Gard. 80:528-580.

FIGS. 3A-3G show an alignment of the G28 lade of transcription factorpolypeptides (SEQ ID NO: 2) and polypeptide sequences encoded bypolynucleotide sequences that are paralogous or orthologous to G28. Thealignment was produced using Clustal X 1.81. The AP2 domains areindicated by the horizontal line at near the top of FIGS. 3D-3F. Themonocot Motif Y subsequences appear in the boxes in FIGS. 3A and 3B.

FIG. 4 depicts a phylogenetic tree of several members of the G28 lade oftranscription factor polypeptides, identified through BLAST analysis ofproprietary (using corn, soy and rice genes) and public data sources(all plant species). This tree was generated as a Clustal X 1.81alignment: MEGA2 tree, Maximum Parsimony, bootstrap consensus.Representative sequences of the G28 clade of transcription factorpolypeptides may within the large box. The smaller box denotesrepresentative members of the G3430 subclade.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

As used herein and in the appended claims, the singular forms “a”, “an”,and “the” include the plural reference unless the context clearlydictates otherwise. Thus, for example, a reference to “a host cell”includes a plurality of such host cells, and a reference to “anantibody” is a reference to one or more antibodies and equivalentsthereof known to those skilled in the art, and so forth

Definitions

“TDR” (in uppercase letters) refers generally to a Transcriptionalregulator of Disease Resistance protein sequence of the presentinvention, including SEQ ID NOs: 2, 4, 8, 10, 12, 14, 16, 18, 20, 22,24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 60, paralogs, orthologs,equivalogs, and fragments thereof. The term “tdr” (in lowercase letters)refers generally to a polynucleotide sequence of the present invention,and includes SEQ ID NOs: 1, 3, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27,29, 31, 33, 35, 37, 39, 41, 59, paralogs, orthologs, equivalogs, andfragments thereof.

“Tolerance” results from specific, heritable characteristics of a hostplant that allow a pathogen to develop and multiply in the host whilethe host, either by lacking receptor sites for, or by inactivating orcompensating for the irritant secretions of the pathogen, still managesto thrive or, in the case of crop plants, produce a good crop. Tolerantplants are susceptible to the pathogen but are not killed by it andgenerally show little damage from the pathogen (Agrios (1988) PlantPathology, 3rd ed. Academic Press, N.Y., p. 129).

“Resistance”, also referred to as “true resistance”, results when aplant contains one or more genes that make the plant and a potentialpathogen more or less incompatible with each other, either because of alack of chemical recognition between the host and the pathogen, orbecause the host plant can defend itself against the pathogen by defensemechanisms already present or activated in response to infection (Agrios(1988) supra p. 125).

“Biologically active” refers to a protein having structural,immunological, regulatory, or chemical functions of a naturallyoccurring, recombinant or synthetic molecule.

“Complementary” refers to the natural hydrogen bonding by base pairingbetween purines and pyrimidines. For example, the sequence A-C-G-T(5′→3′) forms hydrogen bonds with its complements A-C-G-T (5′→3′) orA-C-G-U (5′→3′). Two single-stranded molecules may be consideredpartially complementary, if only some of the nucleotides bond, or“completely complementary” if all of the nucleotides bond. The degree ofcomplementarity between nucleic acid strands affects the efficiency andstrength of the hybridization and amplification reactions. “Fullycomplementary” refers to the case where bonding occurs between everybase pair and its complement in a pair of sequences, and the twosequences have the same number of nucleotides.

A “conserved domain” or “conserved region” as used herein refers to aregion in heterologous polynucleotide or polypeptide sequences wherethere is a relatively high degree of sequence identity between thedistinct sequences.

With respect to polynucleotides encoding presently disclosedtranscription factors, a conserved region is preferably at least 10 basepairs (bp) in length.

A “conserved domain” or “conserved region” as used herein refers to aregion in heterologous polynucleotide or polypeptide sequences wherethere is a relatively high degree of sequence identity between thedistinct sequences. An AP2 domain that is present in a member of AP2transcription factor family is an example of a conserved domain. Withrespect to polynucleotides encoding presently disclosed transcriptionfactors, a conserved domain is preferably at least 10 base pairs (bp) inlength. A “conserved domain”, with respect to presently disclosed AP2domains, refers to a domain within a transcription factor family thatexhibits a higher degree of sequence homology, such as at least 60%sequence identity including conservative substitutions, and morepreferably at least 75% sequence identity, and even more preferably atleast 83%, or at least about 84%, or at least about 86%, or at leastabout 89%, or at least about 90%, or at least about 92%, or at leastabout 95%, or at least about 96% amino acid residue sequence identity tothe conserved domain. A “conserved domain”, with respect to presentlydisclosed “Motif Y”, refers to a domain within a monocot AP2transcription factor sequence that exhibits a high degree of sequencehomology to the Motif Y found in SEQ ID NO: 55, having at least 82%sequence identity with the Motif Y found in SEQ ID NO: 55.

A fragment or domain can be referred to as outside a conserved domain, aconsensus sequence, or a consensus DNA-binding site that is known toexist or that exists for a particular transcription factor class,family, or sub-family. In this case, the fragment or domain will notinclude the exact amino acids of a consensus sequence or consensusDNA-binding site of a transcription factor class, family or sub-family,or the exact amino acids of a particular transcription factor consensussequence or consensus DNA-binding site. Furthermore, a particularfragment, region, or domain of a polypeptide, or a polynucleotideencoding a polypeptide, can be “outside a conserved domain” if all theamino acids of the fragment, region, or domain fall outside of a definedconserved domain(s) for a polypeptide or protein. Sequences havinglesser degrees of identity but comparable biological activity areconsidered to be equivalents.

As one of ordinary skill in the art recognizes, conserved domains oftranscription factors may be identified as regions or domains ofidentity to a specific consensus sequence (see, for example, Riechmannet al. (2000) Science 290: 2105-2110). In the subject invention, theplant transcription factors belong to the AP2 (APETALA2) domaintranscription factor family (Riechmann and Meyerowitz (1998) Biol. Chem.379: 633-646).

The conserved domains for some of the transcription factor polypeptidesin the Sequence Listing are shown in FIGS. 3A-3B and 3D-3E. A comparisonof the regions of the polypeptides in the Sequence Listing, or of thosein FIGS. 3A-3B and 3D-3E, allows one of skill in the art to identifyconserved domain(s) for any of the polypeptides listed or referred to inthis disclosure.

“Derivative” refers to the chemical modification of a nucleic acidmolecule or amino acid sequence. Chemical modifications can includereplacement of hydrogen by an alkyl, acyl, or amino group orglycosylation, pegylation, or any similar process that retains orenhances biological activity or lifespan of the molecule or sequence.

“Fragment” with respect to a polynucleotide refers to a clone or anypart of a nucleic acid molecule that retains a usable, functionalcharacteristic. Fragments include oligonucleotides that may be used inhybridization or amplification technologies or in regulation ofreplication, transcription or translation.

“Fragment” with respect to polypeptide may also include subsequences ofpolypeptides and protein molecules, or a subsequence of the polypeptide.Fragments may have uses in that they may have antigenic potential. Insome cases, the fragment or domain is a subsequence of the polypeptidethat performs at least one biological function of the intact polypeptidein substantially the same manner, or to a similar extent, as does theintact polypeptide. For example, a polypeptide fragment can comprise arecognizable structural motif or functional domain such as a DNA-bindingsite or domain that binds to a DNA promoter region, an activationdomain, or a domain for protein-protein interactions, and may initiatetranscription. Fragments can vary in size from as few as 3 amino acidsto the fall length of the intact polypeptide, but are preferably atleast about 30 amino acids in length and more preferably at least about60 amino acids in length. Exemplary polypeptide fragments are the firsttwenty consecutive amino acids of a mammalian protein encoded by thefirst twenty consecutive amino acids of the transcription factorpolypeptides listed in the Sequence Listing.

Exemplary fragments also include fragments that comprise a conserveddomain of a transcription factor. An example of such an exemplaryfragment would include amino acid residues 45-61 of G3430 (SEQ ID NO:10), as noted in FIGS. 3A-3B.

“Gene” or “gene sequence” refers to the partial or complete codingsequence of a gene, its complement, and its 5′ or 3′ untranslatedregions. A gene is also a functional unit of inheritance, and inphysical terms is a particular segment or sequence of nucleotides alonga molecule of DNA (or RNA, in the case of RNA viruses) involved inproducing a polypeptide chain. The polypeptide chain may be subjected tosubsequent processing to obtain a functional protein or polypeptide. Agene may be isolated, partially isolated, or be found with an organism'sgenome. By way of example, a transcription factor gene encodes atranscription factor polypeptide, which may be functional or requireprocessing to function as an initiator of transcription.

Operationally, genes may be defined by the cis-trans test, a genetictest that determines whether two mutations occur in the same gene andthat may be used to determine the limits of the genetically active unit(Rieger et al. (1976) Glossary of Genetics and Cytogenetics: Classicaland Molecular, 4th ed., Springer Verlag. Berlin). A gene generallyincludes regions preceding (“leaders”; upstream) and following(“trailers”; downstream) of the coding region. A gene may also includeintervening, non-coding sequences, referred to as “introns”, locatedbetween individual coding segments, referred to as “exons”.

Most genes have an associated promoter region, a regulatory sequence 5′of the transcription initiation codon (there are some genes that do nothave an identifiable promoter). The function of a gene may also beregulated by enhancers, operators, and other regulatory elements.

“Homology” refers to sequence similarity between a reference sequenceand at least a fragment of a newly sequenced clone insert or its encodedamino acid sequence.

“Identity” or “similarity” refers to sequence similarity between twopolynucleotide sequences or between two polypeptide sequences, withidentity being a more strict comparison. The phrases “percent identity”and “identity” refer to the percentage of sequence similarity found in acomparison of two or more polynucleotide sequences or two or morepolypeptide sequences. “Sequence similarity” refers to the percentsimilarity in base pair sequence (as determined by any suitable method)between two or more polynucleotide sequences. Two or more sequences canbe anywhere from 0-100% similar, or any integer value therebetween.Identity or similarity can be determined by comparing a position in eachsequence that may be aligned for purposes of comparison. When a positionin the compared sequence is occupied by the same nucleotide base oramino acid, then the molecules are identical at that position. A degreeof similarity or identity between polynucleotide sequences is a functionof the number of identical or matching nucleotides at positions sharedby the polynucleotide sequences. A degree of identity of polypeptidesequences is a function of the number of identical amino acids atpositions shared by the polypeptide sequences. A degree of homology orsimilarity of polypeptide sequences is a function of the number of aminoacids at positions shared by the polypeptide sequences.

With regard to polypeptides, the terms “substantial identity” or“substantially identical” refers to sequences of sufficient structuralsimilarity to the transcription factors in the Sequence Listing toproduce similar function when expressed or overexpressed in a plant. Inthe present invention, similar functions confer increased tolerance orresistance to pathogens. Sequences that are at least 75% identical(e.g., in their AP2 domains) or at least 82% identical (e.g., in theirMotif Ys) have been discovered and many of these are expected to havesimilar function as G28 and G3430 when expressed or overexpressed inplants. Thus, these sequences are considered to have substantialidentity with G28 and G3430. Sequences having lesser degrees of identitybut comparable biological activity are considered to be equivalents. Thestructure required to maintain proper functionality is related to thetertiary structure of the polypeptide. There are discreet domains andmotifs within a transcription factor that must be present within thepolypeptide to confer function and specificity. These specificstructures are required so that interactive sequences will be properlyoriented to retain the desired activity. “Substantial identity” may thusalso be used with regard to subsequences, for example, motifs, that areof sufficient structure and similarity, being at least 75% identical orat least 82% identical to similar motifs in other related sequences sothat each confers or is required for increased tolerance or resistanceto pathogens.

“Alignment” refers to a number of nucleotide bases or amino acid residuesequences aligned by lengthwise comparison so that components in common(i.e., nucleotide bases or amino acid residues) may be visually andreadily identified. The fraction or percentage of components in commonis related to the homology or identity between the sequences. Alignmentssuch as those of FIG. 3 may be used to identify conserved domains andrelatedness within these domains. An alignment may suitably bedetermined by means of computer programs known in the art, such asMACVECTOR (Accelrys, Inc., San Diego, Calif.).

The terms “highly stringent” or “highly stringent condition” refer toconditions that permit hybridization of DNA strands whose sequences arehighly complementary, wherein these same conditions excludehybridization of significantly mismatched DNAs. Polynucleotide sequencescapable of hybridizing under stringent conditions with thepolynucleotides of the present invention may be, for example, variantsof the disclosed polynucleotide sequences, including allelic or splicevariants, or sequences that encode orthologs or paralogs of presentlydisclosed polypeptides. Nucleic acid hybridization methods are disclosedin detail by Kashima et al. (1985) Nature 313: 402-404, and Sambrook etal. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed., Vol. 1-3,Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.; and by Haymeset al. (1985) Nucleic Acid Hybridization: A Practical Approach, IRLPress, Washington, D.C., which references are incorporated herein byreference.

In general, stringency is determined by the temperature, ionic strength,and concentration of denaturing agents (e.g., formamide) used in ahybridization and washing procedure (for a more detailed description ofestablishing and determining stringency, see below). The degree to whichtwo nucleic acids hybridize under various conditions of stringency iscorrelated with the extent of their similarity. Thus, similar nucleicacid sequences from a variety of sources, such as within a plant'sgenome (as in the case of paralogs) or from another plant (as in thecase of orthologs) that may perform similar functions can be isolated onthe basis of their ability to hybridize with known transcription factorsequences. Numerous variations are possible in the conditions and meansby which nucleic acid hybridization can be performed to isolatetranscription factor sequences having similarity to transcription factorsequences known in the art and are not limited to those explicitlydisclosed herein. Such an approach may be used to isolate polynucleotidesequences having various degrees of similarity with disclosedtranscription factor sequences, such as, for example, transcriptionfactors having 60% identity, or more preferably greater than about 70%identity, most preferably 72% or greater identity with disclosedtranscription factors.

The term “equivalog” describes members of a set of homologous proteinsthat are conserved with respect to function since their last commonancestor. Related proteins are grouped into equivalog families, andotherwise into protein families with other hierarchically definedhomology types. This definition is provided at the Institute forGenoinic Research (TIGR) world wide web (www) website, “tigr.org” underthe heading “Terms associated with TIGRFAMs”.

The term “variant”, as used herein, may refer to polynucleotides orpolypeptides that differ from the presently disclosed polynucleotides orpolypeptides, respectively, in sequence from each other, and as setforth below.

With regard to polynucleotide variants, differences between presentlydisclosed polynucleotides and their variants are limited so that thenucleotide sequences of the former and the latter are closely similaroverall and, in many regions, identical. The degeneracy of the geneticcode dictates that many different variant polynucleotides can encodeidentical and/or substantially similar polypeptides in addition to thosesequences illustrated in the Sequence Listing. Due to this degeneracy,differences between presently disclosed polynucleotides and variantnucleotide sequences may be silent in any given region or over theentire length of the polypeptide (i.e., the amino acids encoded by thepolynucleotide are the same, and the variant polynucleotide sequencethus encodes the same amino acid sequence in that region or entirelength of the presently disclosed polynucleotide. Variant nucleotidesequences may encode different amino acid sequences, in which case suchnucleotide differences will result in amino acid substitutions,additions, deletions, insertions, truncations or fusions with respect tothe similar disclosed polynucleotide sequences. These variations resultin polynucleotide variants encoding polypeptides that share at least onefunctional characteristic (i.e., a presently disclosed transcriptionfactor and a variant will confer at least one of the same functions to aplant).

Within the scope of the invention is a variant of a nucleic acid listedin the Sequence Listing, that is, one having a sequence that differsfrom the one of the polynucleotide sequences in the Sequence Listing, ora complementary sequence, that encodes a functionally equivalentpolypeptide (i.e., a polypeptide having some degree of equivalent orsimilar biological activity) but differs in sequence from the sequencein the Sequence Listing, due to degeneracy in the genetic code.

“Allelic variant” or “polynucleotide allelic variant” refers to any oftwo or more alternative forms of a gene occupying the same chromosomallocus. Allelic variation arises naturally through mutation, and mayresult in phenotypic polymorphism within populations. Gene mutations maybe “silent” or may encode polypeptides having altered amino acidsequences. “Allelic variant” and “polypeptide allelic variant” may alsobe used with respect to polypeptides, and in this case the terms referto a polypeptide encoded by an allelic variant of a gene.

“Splice variant” or “polynucleotide splice variant” as used hereinrefers to alternative forms of RNA transcribed from a gene. Splicevariation naturally occurs as a result of alternative sites beingspliced within a single transcribed RNA molecule or between separatelytranscribed RNA molecules, and may result in several different forms ofmessenger RNA (mRNA) transcribed from the same gene. Thus, splicevariants may encode polypeptides having different amino acid sequences,which, in the present context, will have at least one similar functionin the organism (splice variation may also give rise to distinctpolypeptides having different functions). “Splice variant” or“polypeptide splice variant” may also refer to a polypeptide encoded bya splice variant of a transcribed mRNA.

As used herein, “polynucleotide variants” may also refer topolynucleotide sequences that encode paralogs and orthologs of thepresently disclosed polypeptide sequences. “Polypeptide variants” mayrefer to polypeptide sequences that are paralogs and orthologs of thepresently disclosed polypeptide sequences.

“Modulates” refers to a change in activity (biological, chemical, orimmunological) or lifespan resulting from specific binding between amolecule and either a nucleic acid molecule or a protein.

“Nucleic acid molecule” refers to a oligonucleotide, polynucleotide orany fragment thereof. It may be DNA or RNA of genomic or syntheticorigin, double-stranded or single-stranded, and combined withcarbohydrate, lipids, protein, or other materials to perform aparticular activity such as transformation or form a useful compositionsuch as a peptide nucleic acid (PNA).

“Polynucleotide” is a nucleic acid molecule comprising a plurality ofpolymerized nucleotides, e.g., at least about 15 consecutive polymerizednucleotides, optionally at least about 30 consecutive nucleotides, atleast about 50 consecutive nucleotides. A polynucleotide may be anucleic acid, oligonucleotide, nucleotide, or any fragment thereof. Inmany instances, a polynucleotide comprises a nucleotide sequenceencoding a polypeptide (or protein) or a domain or fragment thereof.Additionally, the polynucleotide may comprise a promoter, an intron, anenhancer region, a polyadenylation site, a translation initiation site,5′ or 3′ untranslated regions, a reporter gene, a selectable marker, orthe like. The polynucleotide can be single stranded or double strandedDNA or RNA. The polynucleotide optionally comprises modified bases or amodified backbone. The polynucleotide can be, e.g., genomic DNA or RNA,a transcript (such as an mRNA), a cDNA, a PCR product, a cloned DNA, asynthetic DNA or RNA, or the like. The polynucleotide can be combinedwith carbohydrate, lipids, protein, or other materials to perform aparticular activity such as transformation or form a useful compositionsuch as a peptide nucleic acid (PNA). The polynucleotide can comprise asequence in either sense or antisense orientations. “Oligonucleotide” issubstantially equivalent to the terms amplimer, primer, oligomer,element, target, and probe and is preferably single stranded.

A “recombinant polynucleotide” is a polynucleotide that is not in itsnative state, e.g., the polynucleotide comprises a nucleotide sequencenot found in nature, or the polynucleotide is in a context other thanthat in which it is naturally found, e.g., separated from nucleotidesequences with which it typically is in proximity in nature, or adjacent(or contiguous with) nucleotide sequences with which it typically is notin proximity. For example, the sequence at issue can be cloned into avector, or otherwise recombined with one or more additional nucleicacid.

An “isolated polynucleotide” is a polynucleotide whether naturallyoccurring or recombinant, that is present outside the cell in which itis typically found in nature, whether purified or not. Optionally, anisolated polynucleotide is subject to one or more enrichment orpurification procedures, e.g., cell lysis, extraction, centrifugation,precipitation, or the like.

A “polypeptide” is an amino acid sequence comprising a plurality ofconsecutive polymerized amino acid residues e.g., at least about 15consecutive polymerized amino acid residues, optionally at least about30 consecutive polymerized amino acid residues, at least about 50consecutive polymerized amino acid residues. In many instances, apolypeptide comprises a polymerized amino acid residue sequence that isa transcription factor or a domain or portion or fragment thereof. Atranscription factor can regulate gene expression and may increase ordecrease gene expression in a plant. Additionally, the polypeptide maycomprise 1) a localization domain, 2) an activation domain, 3) arepression domain, 4) an oligomerization domain, or 5) a DNA-bindingdomain, or the like. The polypeptide optionally comprises modified aminoacid residues, naturally occurring amino acid residues not encoded by acodon, non-naturally occurring amino acid residues.

A “recombinant polypeptide” is a polypeptide produced by translation ofa recombinant polynucleotide. A “synthetic polypeptide” is a polypeptidecreated by consecutive polymerization of isolated amino acid residuesusing methods well known in the art. An “isolated polypeptide,” whethera naturally occurring or a recombinant polypeptide, is more enriched in(or out of) a cell than the polypeptide in its natural state in awild-type cell, e.g., more than about 5% enriched, more than about 10%enriched, or more than about 20%, or more than about 50%, or more,enriched, i.e., alternatively denoted: 105%, 110%, 120%, 150% or more,enriched relative to wild type standardized at 100%. Such an enrichmentis not the result of a natural response of a wild-type plant.Alternatively, or additionally, the isolated polypeptide is separatedfrom other cellular components with which it is typically associated,e.g., by any of the various protein purification methods herein.

“Portion”, as used herein, refers to any part of a polynucleotide orpolypeptide used for any purpose. This includes portions of polypeptidesused in the screening of a library of molecules that specifically bindto a portion of a polypeptide or for the production of antibodies.

“Protein” refers to an amino acid sequence, oligopeptide, peptide,polypeptide or portions thereof whether naturally occurring orsynthetic.

The term “plant” includes whole plants, shoot vegetativeorgans/structures (for example, leaves, stems and tubers), roots,flowers and floral organs/structures (for example, bracts, sepals,petals, stamens, carpels, anthers and ovules), seed (including embryo,endosperm, and seed coat) and fruit (the mature ovary), plant tissue(for example, vascular tissue, ground tissue, and the like) and cells(for example, guard cells, egg cells, and the like), and progeny ofsame. The class of plants that can be used in the method of theinvention is generally as broad as the class of higher and lower plantsamenable to transformation techniques, including angiosperms(monocotyledonous and dicotyledonous plants), gymnosperms, ferns,horsetails, psilophytes, lycophytes, bryophytes, and multicellularalgae. (See for example, FIG. 1, adapted from Daly et al. (2001) PlantPhysiol. 127: 1328-1333; FIG. 2, adapted from Ku et al. (2000) Proc.Natl. Acad. Sci. USA 97: 9121-9126; and see also Tudge, in The Varietyof Life, Oxford University Press, New York, N.Y. (2000) pp. 547-606).

A “transgenic plant” refers to a plant that contains genetic materialnot found in a wild-type plant of the same species, variety or cultivar.The genetic material may include a transgene, an insertional mutagenesisevent (such as by transposon or T-DNA insertional mutagenesis), anactivation tagging sequence, a mutated sequence, a homologousrecombination event or a sequence modified by chimeraplasty. Typically,the foreign genetic material has been introduced into the plant by humanmanipulation, but any method can be used as one of skill in the artrecognizes.

A transgenic plant may contain an expression vector or cassette. Theexpression cassette typically comprises a polypeptide-encoding sequenceoperably linked (i.e., under regulatory control of) to appropriateinducible or constitutive regulatory sequences that allow for theexpression of polypeptide.

The expression cassette can be introduced into a plant by transformationor by breeding after transformation of a parent plant. A plant refers toa whole plant, including seedlings and mature plants, as well as to aplant part, such as seed, fruit, leaf, or root, plant tissue, plantcells or any other plant material, e.g., a plant explant, as well as toprogeny thereof, and to in vitro systems that mimic biochemical orcellular components or processes in a cell.

“Substrate” refers to any rigid or semi-rigid support to which nucleicacid molecules or proteins are bound and includes membranes, filters,chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels,capillaries or other tubing, plates, polymers, and microparticles with avariety of surface forms including wells, trenches, pins, channels andpores.

A “trait” refers to a physiological, morphological, biochemical, orphysical characteristic of a plant or particular plant material or cell.In some instances, this characteristic is visible to the human eye, suchas seed or plant size, or can be measured by biochemical techniques,such as detecting the protein, starch, or oil content of seed or leaves,or by observation of a metabolic or physiological process, e.g. bymeasuring uptake of carbon dioxide, or by the observation of theexpression level of a gene or genes, e.g., by employing Northernanalysis, RT-PCR, microarray gene expression assays, or reporter geneexpression systems, or by agricultural observations such as stresstolerance, yield, or pathogen tolerance. Any technique can be used tomeasure the amount of, comparative level of, or difference in anyselected chemical compound or macromolecule in the transgenic plants,however.

“Trait modification” refers to a detectable difference in acharacteristic in a plant ectopically expressing a polynucleotide orpolypeptide of the present invention relative to a plant not doing so,such as a wild-type plant. In some cases, the trait modification can beevaluated quantitatively. For example, the trait modification can entailat least about a 2% increase or decrease in an observed trait(difference), at least a 5% difference, at least about a 10% difference,at least about a 20% difference, at least about a 30%, at least about a50%, at least about a 70%, or at least about a 100%, or an even greaterdifference compared with a wild-type plant. It is known that there canbe a natural variation in the modified trait. Therefore, the traitmodification observed entails a change of the normal distribution of thetrait in the plants compared with the distribution observed in wild-typeplant.

“Transcript profile” refers to the expression levels of a set of genesin a cell in a particular state, particularly by comparison with theexpression levels of that same set of genes in a cell of the same typein a reference state. The transcript profile of a particulartranscription factor in a suspension cell corresponds to the expressionlevels of a set of genes in a cell overexpressing that transcriptionfactor, compared with the expression levels of that same set of genes ina suspension cell that has normal levels of that transcription factor.The transcript profile can be presented as a list of those genes whoseexpression level is significantly different between the two treatments,and the difference ratios. Differences and similarities betweenexpression levels may be evaluated and calculated using statistical andclustering methods.

“Wild type” or “wild-type”, as used herein, refers to a plant cell,seed, plant component, plant tissue, plant organ or whole plant that hasnot been genetically modified or treated in an experimental sense.Wild-type cells, seed, components, tissue, organs or whole plants may beused as controls to compare levels of expression and the extent andnature of trait modification with cells, tissue or plants of the samespecies in which a transcription factor expression is altered, e.g., inthat it has been knocked out, overexpressed, or ectopically expressed.

A “control plant” as used herein refers to a plant cell, seed, plantcomponent, plant tissue, plant organ or whole plant used to compareagainst transgenic or genetically modified plant for the purpose ofidentifying an enhanced phenotype in the transgenic or geneticallymodified plant. A control plant may in some cases be a transgenic plantline that comprises an empty vector or marker gene, but does not containthe recombinant polynucleotide of the present invention that isexpressed in the transgenic or genetically modified plant beingevaluated. In general, a control plant is a plant of the same line orvariety as the transgenic or genetically modified plant being tested. Asuitable control plant would include a genetically unaltered ornon-transgenic plant of the parental line used to generate a transgenicplant herein.

Polypeptides and Polynucleotides of the Invention

The present invention provides, among other things, transcriptionfactors, and transcription factor homolog polypeptides, and isolated orrecombinant polynucleotides encoding the polypeptides, or novel sequencevariant polypeptides or polynucleotides encoding novel variants oftranscription factors derived from the specific sequences provided inthe Sequence Listing. Also provided are methods for increasing a plant'stolerance to one or more pathogens or abiotic stresses. These methodsare based on the ability to alter the expression of critical regulatorymolecules that may be conserved between diverse plant species.

Related conserved regulatory molecules may be originally discovered in amodel system such as Arabidopsis and homologous, functional moleculesthen discovered in other plant species. The latter may then be used toconfer tolerance to one or more pathogens or abiotic stresses in diverseplant species.

Exemplary polynucleotides encoding the polypeptides of the inventionwere identified in the Arabidopsis thaliana GenBank database usingpublicly available sequence analysis programs and parameters. Sequencesinitially identified were then further characterized to identifysequences comprising specified sequence strings corresponding tosequence motifs present in families of known transcription factors. Inaddition, further exemplary polynucleotides encoding the polypeptides ofthe invention were identified in the plant GenBank database usingpublicly available sequence analysis programs and parameters. Sequencesinitially identified were then further characterized to identifysequences comprising specified sequence strings corresponding tosequence motifs present in families of known transcription factors.Polynucleotide sequences meeting such criteria were confirmed astranscription factors.

Additional polynucleotides of the invention were identified by screeningArabidopsis thaliana and/or other plant cDNA libraries with probescorresponding to known transcription factors under low stringencyhybridization conditions. Additional sequences, including fall lengthcoding sequences were subsequently recovered by the rapid amplificationof cDNA ends (RACE) procedure, using a commercially available kitaccording to the manufacturer's instructions. Where necessary, multiplerounds of RACE are performed to isolate 5′ and 3′ ends. The full-lengthcDNA was then recovered by a routine end-to-end polymerase chainreaction (PCR) using primers specific to the isolated 5′ and 3′ ends.Exemplary sequences are provided in the Sequence Listing.

These sequences and others derived from diverse species and found in theSequence Listing have been ectopically expressed in overexpressor orknockout plants. The changes in the characteristic(s) or trait(s) of theplants were then observed and found to confer increased abiotic stressor disease tolerance. Therefore, the polynucleotides and polypeptidescan be used to improve desirable characteristics of plants.

The polynucleotides of the invention were also ectopically expressed inoverexpressor plant cells and the changes in the expression levels of anumber of genes, polynucleotides, and/or proteins of the plant cellsobserved. Therefore, the polynucleotides and polypeptides can be used tochange expression levels of a genes, polynucleotides, and/or proteins ofplants.

The AP2 family. AP2 (APETALA2) and EREBPs (Ethylene-Responsive ElementBinding Proteins) are the prototypic members of a family oftranscription factors unique to plants, whose distinguishingcharacteristic is that they contain the so-called AP2 DNA-binding domain(Riechmann and Meyerowitz (1998) Biol. Chem. 379: 633-646). The AP2domain was first recognized as a repeated motif within the Arabidopsisthaliana AP2 protein (Jofuku et al. (1994) Plant Cell 6: 1211-1225).Four DNA-binding proteins from tobacco were identified that interactwith a sequence that is essential for the responsiveness of somepromoters to the plant hormone ethylene, and were designated asethylene-responsive element binding proteins (EREBPs; Ohme-Takagi et al.(1995) supra). The DNA-binding domain of EREBP-2 was mapped to a regionthat was common to all four proteins (Ohme-Takagi et al (1995) supra),and that was found to be closely related to the AP2 domain (Weigel(1995) Plant Cell 7: 388-389) but that did not bear sequence similarityto previously known DNA-binding motifs.

AP2/EREBP genes form a large family, with many members known in severalplant species (Okamuro et al. (1997) Proc. Natl. Acad. Sci. USA 94:7076-7081; Riechmann and Meyerowitz (1998) supra). The number ofAP2/EREBP genes in the Arabidopsis thaliana genome is approximately 145(Riechmann et al. (2000) Science 290: 2105-2110). The APETALA2 classcontains 14 genes and is characterized by the presence of two AP2 DNAbinding domains. The AP2/ERF is the largest subfamily, and includes 125genes that are involved in abiotic (DREB subgroup) and biotic (ERFsubgroup) stress responses and the RAV subgroup includes six genes thatall have a B3 DNA binding domain in addition to the AP2 DNA bindingdomain (Kagaya et al. (1999) Nucleic Acids Res. 27: 470-478).

Arabidopsis AP2 is involved in the specification of sepal and petalidentity through its activity as a homeotic gene that forms part of thecombinatorial genetic mechanism of floral organ identity determination,and it is also required for normal ovule and seed development (Bowman etal. (1991) Development 112: 1-20; Jofuku et al. (1994) supra).Arabidopsis ANT is required for ovule development and it also plays arole in floral organ growth (Elliott et al. (1996) Plant Cell 8:155-168; Klucher et al. (1996) Plant Cell 8: 137-153). Finally, maizeG115 regulates leaf epidermal cell identity (Moose et al. (1996) GenesDev. 10: 3018-3027).

The attack of a plant by a pathogen may induce defense responses thatlead to resistance to the invasion, and these responses are associatedwith transcriptional activation of defense-related genes, among themthose encoding pathogenesis-related (PR) proteins. The involvement ofEREBP-like genes in controlling the plant defense response is based onthe observation that many PR gene promoters contain a short cis-actingelement that mediates their responsiveness to ethylene (ethylene appearsto be one of several signal molecules controlling the activation ofdefense responses). Tobacco EREBP-1, -2, -3, and -4, and tomato Pti4,Pti5 and Pti6 proteins have been shown to recognize such cis-actingelements (Ohme-Takagi (1995) supra; Zhou et al. (1997) EMBO J. 16:3207-3218). In addition, Pti4, Pti5, and Pti6 proteins have been shownto interact directly with Pto, a protein kinase that confers resistanceagainst Pseudomonas syringae pv tomato (Zhou et al. (1997) supra).Plants are also challenged by adverse environmental conditions such ascold or drought, and EREBP-like proteins appear to be involved in theresponses to these abiotic stresses as well. COR (for cold-regulated)gene expression is induced during cold acclimation, the process by whichplants increase their resistance to freezing in response to lowtemperatures. The Arabidopsis EREBP-like gene CBF1 (Stockinger et al.(1997) Proc. Natl. Acad. Sci. USA 94: 1035-1040) is a regulator of thecold acclimation response, because ectopic expression of CBF1 inArabidopsis transgenic plants induced COR gene expression in the absenceof a cold stimulus, and the plant freezing tolerance was increased(Jaglo-Ottosen et al. (1998) Science 280: 104-106). Another ArabidopsisEREBP-like gene, AB14, is involved in abscisic acid (ABA) signaltransduction, because abi4 mutants are insensitive to ABA (ABA is aplant hormone that regulates many agronomically important aspects ofplant development; Finkelstein et al. (1998) Plant Cell 10: 1043-1054).

Novel AP2 transcription factor genes and binding motifs in Arabidopsisand other diverse species. G28 corresponds to AtERF1 (GenBank accessionnumber AB008103; Fujimoto et al. (2000) supra). G28 appears as geneAT4g17500 in the annotated sequence of Arabidopsis chromosome 4(AL161546.2).

AtERF1 has been shown to have GCC-box binding activity; somedefense-related genes that are induced by ethylene were found to containa short cis-acting element known as the GCC-box: AGCCGCC (Ohme-Takagi etal. (1995) supra; and Ohme-Takagi and Shinshi (1990) Plant Mol. Biol.15: 941-946. Using transient assays in Arabidopsis leaves, ATERF1 wasfound to be able to act as a GCC-box sequence specific transactivator(Fujimoto et al. (2000) supra).

AtERF1 expression has been described to be induced by ethylene (two- tothree-fold increase in AtERF1 transcript levels 12 hours after ethylenetreatment; Fujimoto et al. (2000) supra). In the ein2 mutant, theexpression of AtERF1 was not induced by ethylene, suggesting that theethylene induction of AtERF1 is regulated under the ethylene signalingpathway (Fujimoto et al. (2000) supra). AtERF1 expression was alsoinduced by wounding, but not by other abiotic stresses (such as cold,salinity, or drought; Fujimoto et al. (2000) supra).

AtERF-type transcription factors respond to abiotic stress. WhileERF-type transcription factors are primarily recognized for respondingto a variety of biotic stresses (such as pathogen infection), some ERFshave been characterized as being responsive to abiotic stress. Fujimotoet. al. (Fujimoto et. al. (2000) Plant Cell 12: 393404 have shown thatAtERF1, AtERF2, AtERF3, AtERF4, and, AtERF5, corresponding to G28,G1006, G1005, G6 and G1004 respectively, can respond to various abioticstresses, including cold, heat, drought, ABA, CHX, and wounding. Genesnormally associated with the plant defense response (PR1, PR2, PR5, andperoxidases) have also been shown to be regulated by water stress (Zhuet. al. (1995) Plant Physiol. 108: 929-937; Ingram and Bartels (1996).Annu Rev. Plant Physiol. Plant Mol. Biol. 47:377-403) suggesting someoverlap between the two responses. A target sequence for ERF-typetranscription factors has been identified and extensively studied (Haoet al. (1998) J. Biol. Chem. 273: 26857-26861). This target sequenceconsists of AGCCGCC and has been found in the 5′ upstream regions ofgenes responding to disease and regulated by ERFs. However, severalgenes (ARSK1 and dehydrin) known to be induced by ABA, NaCl, cold andwounding, also possess a GCC box regulatory element in their 5′ upstreamregions (Hwang and Goodman (1995) Plant J. 8: 37-43), suggesting thatERF-type transcription factors may regulate also regulate abiotic stressassociated genes.

ERF-type transcription factors in other species. ERF-type transcriptionfactors have been characterized in other species. Tsi1, a tobacco AtERFortholog has been shown to be responsive to NaCl, drought, wounding,salicylic acid (SA), ethephon, ABA, and methyl jasmonate (MeJA; Park et.al. (2001) Plant Cell 13: 1035-1046). Tsi1 is closely related toAt4g27950 (G1750) in Arabidopsis. RT data suggest that G1750 may alsohave a similar function, although overexpression of G1750 causes somedeleterious effects. In tobacco plants, however, overexpression of Tsi1enhances resistance to both pathogen challenge and osmotic stress (Parket. al. (2001) supra). Interestingly, Tsi1 has also been shown tointeract specifically with both GCC and DRE regulatory elements. Genescontaining DRE elements are known to be regulated in response to abioticstresses; as such, it is possible that Tsi1 has the ability to regulatethe transcription of genes involved in abiotic stresses such as drought.

ERF-type transcription factors are well known to be transcriptionalactivators of disease responses (Fujimoto et. al. (2000) supra; Gu etal. (2000) Plant Cell 12: 771-786; Chen et al. (2002) Plant Cell 14:559-574; Cheong et al. (2002) Plant Physiol. 129: 661-677; Onate-Sanchezand Singh (2002) Plant Physiol. 128: 1313-1322; Brown et al. (2003)Plant Physiol. 132: 1020-1032; Lorenzo et al. (2003) Plant Cell 15:165-178) but have not been well characterized as being involved inresponse to abiotic stress conditions such as drought. Another group ofAP2 transcription factors (DREBs), which includes the CBF class, areknown to bind DRE elements in genes responding to abiotic stresses suchas drought, high salt, and cold (Haake et al. (2002) Plant Physiol. 130:639-648; Thomashow (2001) Plant Physiol. 125: 89-93, Liu et al. (1998)Plant Cell 10: 1391-1406; Gilmour et al. (2000) Plant Physiol. 124:1854-1865; and Shinozaki and Yamaguchi-Shinozaki (2000) Curr. Opin.Plant Biol. 3: 217-223). However, there is growing evidence thatERF-type transcription factors can interact with not only the GCC-box,but also with regulatory elements present in genes that are responsiveto osmotic stresses. Thus, it is becoming apparent from our studies aswell as those of others that some ERF-type transcription factors mayplay a role in response to drought-related stress.

The role of ERF-type transcription factors in disease responses. Thefirst indication that members of the ERF group might be involved inregulation of plant disease resistance pathways was the identificationof Pti4, Pti5 and Pti6 as interactors with the tomato disease resistanceprotein Pto in yeast 2-hybrid assays (Zhou et al, (1997) EMBO J. 16:3207-3218). Since that time, several ERF genes have been shown toenhance disease resistance when overexpressed in Arabidopsis or otherspecies. These ERF genes include ERF1 (G1266) of Arabidopsis(Berrocal-Lobo et al. (2002) Plant J. 29: 23-32), Pti4 (Gu et al. (2002)Plant Cell 14: 817-831), and Pti5 (He et al. (2001) Mol. Plant MicrobeInteract. 14: 1453-1457) of tomato, Tsi1 of tobacco (Park et. al. (2001)supra; Shin et al. (2002) Mol. Plant Microbe Interact. 15: 983-989), andAtERF1 (G28) and TDR1 (G1792) of Arabidopsis.

Regulation of ERF transcription factors by pathogen and small moleculesignaling. ERF genes show a variety of stress-regulated expressionpatterns. Regulation by disease-related stimuli such as ethylene (ET),jasmonic acid (JA), SA, and infection by virulent or avirulent pathogenshas been shown for a number of ERF genes (Fujimoto et. al. (2000) supra;Gu et al. (2000) supra; Chen et al. (2002) supra; Cheong et al. (2002)supra; Onate-Sanchez and Singh (2002) supra; Brown et al. (2003) supra;Lorenzo et al. (2003) supra). However, some ERF genes are also inducedby wounding and abiotic stresses (Fujimoto et. al. (2000) supra; Park etal. (2001) supra; Chen et al. (2002) supra; Tournier et al. (2003) FEBSLett. 550: 149-154). Currently, it is difficult to assess the overallpicture of ERF regulation in relation to phylogeny, since differentstudies have concentrated on different ERF genes, treatments and timepoints. The advent of the Arabidopsis whole-genome microarray willresult in more easily comparable data.

Significantly, several ERF transcription factors that confer enhanceddisease resistance when overexpressed, such as ERF1, Pti4, and AtERF1,are transcriptionally regulated by pathogens, ET, and JA (Fujimoto et.al. (2000) supra; Onate-Sanchez and Singh (2002) supra; Brown et al.(2003) supra; Lorenzo et al. (2003) supra). ERF1 is inducedsynergistically by ET and JA, and induction by either hormone isdependent on an intact signal transduction pathway for both hormones,indicating that ERF1 may be a point of integration for ET and JA(Lorenzo et al. (2003) supra). At least four other ERFs are also inducedby JA and ET (Brown et al. (2003) supra), implying that other ERFs areprobably also important in ET/JA signal transduction. A number of thegenes in subgroup 1, including AtERF3 and AtERF4, are thought to act astranscriptional repressors (Fujimoto et. al. (2000) supra), and thesetwo genes were found to be induced by ET, JA, and an incompatiblepathogen (Brown et al. (2003) supra). The net transcriptional effect onthese pathways may be balanced between activation and repression oftarget genes.

The SA signal transduction pathway can act antagonistically to the ET/JApathway. Interestingly, Pti4 and AtERF1 are induced by SA as well as byJA and ET (Gu et al. (2000) supra; Onate-Sanchez and Singh (2002)supra). Pti4, Pti5 and Pti6 have been implicated indirectly inregulation of the SA response, perhaps through interaction with othertranscription factors, since overexpression of these genes inArabidopsis induced SA-regulated genes without SA treatment and enhancedthe induction seen after SA treatment (Gu et al. (2002) supra).

Post-transcriptional regulation of ERF genes by phosphorylation may be asignificant form of regulation. Pti4 has been shown to be phosphorylatedspecifically by the Pto kinase, and this phosphorylation enhancesbinding to its target sequence (Gu et al. (2000) supra). Recently, theOsEREBP1 gene of rice has been shown to be phosphorylated by thepathogen-induced MAP kinase BWMK1, and this phosphorylation was shown toenhance its binding to the GCC box (Cheong et al. (2003) Plant Physiol.132: 1961-1972), suggesting that phosphorylation of ERF proteins may bea common theme. A potential MAPK phosphorylation site has been noted inAtERF5 (Fujimoto et. al. (2000) supra).

Target genes regulated by ERF transcription factors. Binding of ERFtranscription factors to the target sequence AGCCGCC (the GCC box) hasbeen extensively studied (Hao et al. (1998) supra). This element isfound in a number of promoters of pathogenesis-related and ET- orJA-induced genes. However, it is unclear how much overlap there is intarget genes for particular ERFs. Recent studies have profiled genesinduced in Arabidopsis plants overexpressing ERF1 (Lorenzo et al. (2003)supra) and Pti4 (Chakravarthy et al. (2003) Plant Cell 15: 3033-3050).However, these studies were done with different technology (AffymetrixGeneChip vs. serial analysis of gene expression) and under differentconditions, and it is therefore difficult to compare the resultsdirectly. There is evidence that flanking sequences can affect thebinding of ERFs to the GCC box (Gu et al. (2002) supra; Tournier et al.(2003) supra), so it is likely that different ERFs will regulatesomewhat different gene sets. Direct comparisons of transcript profilesfrom plants overexpressing different ERFs, or of its vitro bindingaffinity of multiple ERFs to sites with varied flanking sequences, willlikely be necessary to confirm conclusions about the degree of overlapin ERF target sets. Recent chromatin immunoprecipitation experimentswith Pti4 suggest that it may also bind non-GCC box promoters, eitherdirectly or through interaction with other transcription factors(Chakravarthy et al. (2003) supra). This observation is particularlyinteresting in light of the hypothesis advanced by Gu et al. ((2002)supra) that Pti4 may regulate SA-induced genes through interaction withother transcription factors.

Identification of Residues and Motifs Unique to G28 Monocot Orthologs.

A number of sequences evolutionarily related to G28 were aligned usingClustal X (version 1.81, June 2000). Additional sequences were includedin the alignment that were identified by BLASTP analysis of proprietaryand public databases with protein sequences with a high degree ofsequence relatedness to G28, particularly in the AP2 domain. Aneighbor-joining algorithm comparing the AP2 domains of these sequenceswas then used to generate a phylogenetic tree, using Clustal X v1.81 'sphylogenetic capabilities. Based on comparisons of the sequences in thealignment and, in particular, the phylogenetic analysis, the sequenceswith a common evolutionary history with reference to G28 were found in aseparate lade, herein referred to as the “G28 clade of transcriptionfactor polypeptides”, or simply the “G28 clade” (FIG. 4 provides anexample of a phylogenetic tree that distinguishes the G28 lade fromsequences outside of the lade).

Two sequences in this clade, G28 and a tomato sequence, Pti4, have beenshown to confer enhanced disease tolerance when overexpressed inArabidopsis (Heard (2004) U.S. Pat. No. 6,664,446; and Gu et al. (2002)Plant Cell 14, 817-831). One of the tobacco transcription factor geneshas been shown previously to control the expression of basic PR genes,which are known to be involved in disease resistance responses (Kitajimaet al. (2000) Plant Cell Physiol. 41: 817-824). Real time PCRexperiments have shown that G28 and orthologs in Brassica napus (canola;orthologs Bn bh594074, Bn bh454277), Zea mays (G3661) and Oryza sativa(G3430) were induced by the disease-related hormone treatments MeJA andSA in the plant species in which they are found, consistent with a rolefor these genes in disease resistance. These observations support thepremise that G28 lade sequences have conserved function across monocotand dicot lineages, and that the G28 clade comprises a number of genesinvolved in the control of disease resistance genes and the regulationof disease resistance.

After the G28 lade was identified, re-examination of the alignment ofthe sequences of the G28 clade of transcription factor polypeptidesindicated a high degree of conservation of the AP2 DNA binding domain inall members of the lade. This enabled the definition of those sequenceelements that define, structurally, the protein sequences comprising theG28 clade. There is also a high degree of conservation in additionalmotifs in all members of the clade. For example, residues correspondingto positions 76-85 of G28 (designated Motif X, SEQ ID NO: 56):

N/D D/Y A/S/T D/E/Q M/I L/V/F/A V/L/I/Q Y/F/N

are highly conserved in all members of the clade. The rest of Motif X,corresponding to positions 86-91 in G28, is less conserved, but is foundin all members of the clade with the exception G3430:

X X L/M X D/E A/G

Within the G28 clade, a further subclade can be seen that includes onlymonocot sequences, and which share a common evolutionary history sincethe last common ancestor of monocots and dicots. Alignment of thesesequences enabled the definition of those sequence elements that define,structurally, the sequences of the monocot subclade of the G28 clade.These monocot sequences were very similar in their AP2 domains and weredistinguished from the dicot sequences by the presence of a highlyconserved structural element or motif found just before (nearer theN-terminus) of Motif X. This sequence, herein referred to as “Motif Y”,may be represented by SEQ ID NO: 55 found in G3430, and corresponding topositions 45-61 of G3430. Motif Y is generally found as the subsequence:

S F G/W S/I L V/A A D Q/M W S D/E/G S L P F R.

This latter motif, shown in the monocot-derived sequences appearing inTables 1 and 2, is considered to comprise a conserved structural elementinvolved in the function of these monocot proteins, and provides asequence element that is useful in the identification of other monocottranscription factor genes capable of conferring disease resistance inplants.

The monocot sequences within the G28 clade thus form a subclade withinthe G28 clade, said subgroup herein referred to as the “G3430 subcladeof transcription factor polypeptides”, or simply the “G3430 subclade”.

Relatedness and utilities of the polynucleotides and polynucleotides ofthe invention. Table 1 shows the polypeptides identified by polypeptideSEQ ID NO (first column); Gene ID (GID) No.; (second column); thespecies of plant from which the sequence is derived (third column); theamino acid coordinates of the AP2 domain of the sequence (fourthcolumn); the AP2 domain subsequences of the respective polypeptides(fifth column); the percentage identity to the AP2 domain of G3430(found within SEQ ID NO: 10; sixth column); for monocot-derivedsequences, the subsequence that is similar to Motif Y (seventh column);and the identity in percentage terms of each Motif Y subsequence to theMotif Y of SEQ ID NO: 55. These polypeptide sequences have AP2 domainswith 75% or greater identity to the AP2 domain of G3430. Motif Ys inmonocots are also highly conserved, and share 82% or greater identitywith SEQ ID NO: 55 in the sequences that have been examined (see alsoTable 2). TABLE 1 Gene families and binding domains % ID AP2 to AP2 % IDto SEQ Domains in domain Motif Y Motif Y, ID GID AA of subsequence SEQID NO: No. Species Coordinates AP2 domain G3430 (in monocots) NO: 55 10G3430 Oryza 109-173 RGKHYRGVRQRPWG 100% SFGSLVADQ 100% sativaKFAAEIRDPAKNGAR WSESLPFR VWLGTFDSAEEAAVA YDRAAYRMRGSRALL NFPLRI 30 G3864Triticum 127-191 RGKHFRGVRQRPWG 96% SFGSLVADQ 100% aestivumKFAAEIRDPAKNGAR WSESLPFR VWLGTFDSAEDAAVA YDRAAYRMRGSRALL NFPLRI 32 G3865Triticum 125-189 RGKHFRGVRQRPWG 96% SFGSLVADQ 100% aestivumKFAAEIRDPAKNGAR WSESLPFR VWLGTFDSAEDAAVA YDRAAYRMRGSRALL NFPLRI 34 G3856Zea mays 140-204 RGKHYRGVRQRPWG 96% SFGSLVADQ 100% KFAAEIRDPAKNGARWSESLPFR VWLGTYDSAEDAAV AYDRAAYRMRGSRA LLNFPLRI 36 G3848 Oryza 149-213RGKHYRGVRQRPWG 95% SFGSLVAD 88% sativa KFAAEIRDPAKNGAR MWSDSLPFRVWLGTFDTAEDAALA YDRAAYRMRGSRALL NFPLRI 12 G3661 Zea mays 126-190RGKHYRGVRQRPWG 92% SFGSLVADQ 94% KFAAEIRDPARNGAR WSGSLPFR VWLGTYDTAEDAALAYDRAAYRMRGSRA LLNFPLRI 26 G3718 Glycine 139-203 KGKHYRGVRQRPWG 92% maxKFAAEIRDPAKNGAR VWLGTFETAEDAALA YDRAAYRMRGSRALL NFPLRI 8 G3717 Glycine130-194 KGKHYRGVRQRPWG 90% max KFAAEIRDPAKNGAR VWLGTFETAEDAALAYDRAAYRMRGSRALL NFPLRV 24 G3844 Medicago 141-205 KGKHYRGVRQRPWG 90%truncatula KFAAEIRDPAKNGAR VWLGTFETAEDAALA YDRAAYRMRGSRALL NFPLRV 2 G28Arabidopsis 144-208 KGKHYRGVRQRPWG 89% thaliana KFAAEIRDPAKNGARVWLGTFETAEDAALA YDRAAFRMRGSRALL NFPLRV 20 G3659 Brassica 130-194KGKHYRGVRQRPWG 89% oleracea KFAAEIRDPAKGAR VWLGTFETAEDAALAYDRAAFRMRGSRALL NFPLRV 4 G1006 Arabidopsis 113-177 KAKHYRGVRQRPWG 86%thaliana KFAAEIRDPAKNGAR VWLGTFETAEDAALA YDIAAFRMRGSRALL NFPLRV 22 G3660Brassica 119-183 KGKHYRGVRQRPWG 86% oleracea KFAAEIRDPAKKGAREWLGTFETAEDAALA YDRAAFRMRGSRALL NFPLRV 16 G3846 Nicotiana  95-159KGRHYRGVRQRPWG 86% tabacum KFAAEIRDPAKNGAR VWLGTYETAEEAALAYDKAAYRMRGSKAL LNFPHRI 28 G3843 Lycopersicon 130-194 KAKHYRGVRVRPWG 84%esculentum KFAAEIRDPAKNGAR VWLGTYETAEDAALA YDKAAFRMRGSRALL NFPLRI 18G3841 Lycopersicon 102-166 KGRHYRGVRQRPWG 84% Pti4 esculentumKFAAEIRDPAKNGAR VWLGTYETAEEAAIA YDKAAYRMRGSKAH LNFPHRI 42 G3858 Solanum108-172 KGRHYRGVRQRPWG 84% tuberosum KFAAEIRDPAKNGAR VWLGTYESAEEAALAYDIAAFRMRGTKALL NFPHRI 38 G3857 Solanum  98-162 KGRHYRGVRQRPWG 84%tuberosum KFAAEIRDPAKNGAR VWLGTYETAEEAAIA YDKAAYRMRGSKAH LNFPHRI 40G3852 Lycopersicon 103-167 KGRHYRGVRQRPWG 83% esculentum KFAAEIRDPAKNGARVWLGTYESAEEAALA YGKAAFRMRGTKALL NFPHRI 14 G3845 Nicotiana 101-165RGRHYRGVRRRPWG 83% tabacum KFAAEIRDPAKNGAR VWLGTYETDEEAAIAYDKAAYRMRGSKAH LNFPHRI 60 G22 Arabidopsis  88-152 KGMQYRGVRRRPWG 75%thaliana KFAAEIRDPKKNGAR VWLGTYETPEDAAVA YDRAAFQLRGSKAKL NFPHLI

The transcription factors of the invention each possess an AP2 domain,and include paralogs and orthologs of G28 and G3430 found by BLASTanalysis, as described below. The transcription factors of the inventionthat are derived from monocot plants also contain a Motif Y.

TDR polypeptides share several potential protein kinase phosphorylationsites, in particular those phosphorylation sites in regions homologousto that of the Arabidopsis phosphorylation sites at amino acid residuesS67, S100, S101, S102, S111, S220, S223, S224, S227 of SEQ ID NO: 2(G28) and at amino acid residues S73, T188, S189, S192, S193, S194, S204of SEQ ID NO: 4 (G1006). The potential protein kinase phosphorylationsites are sites that may be modified by a protein kinase selected from,but not limited to, an isoform of protein kinase C, protein kinase A,protein kinase G, casein kinase II, or Pto kinase.

Eleven TDR polypeptide sequences share at least three conserved regionsdistinct from the AP2 domain. One region, amino acid consensus sequence1 motif, is exemplified by contiguous amino acid residues L71 throughF91 of SEQ ID NO: 2 and has the consensus sequenceLeu-Pro-Leu/Phe-Lys/Arg-Glu/Pro/hrSer/Gly/Asp-Asn/Asp-Asp-Ser/Ala-Glu/Asp-Asp-Met-Leu-Val-Val/Leu/Ile-Tyr/Phe-Gly/Thr-Ile/Leu/Val/Ala-Leu-Xaa-Asp-Ala-Phe/Leu/Val,where Xaa is any amino acid residue. A second region, amino acidconsensus sequence 2 motif, is exemplified by contiguous amino acidresidues K235 through R238 of SEQ ID NO: 2, and comprises basic residueswith the consensus sequence Lys-Lys/Arg-Arg/Lys-Arg/Lys. A third region,amino acid consensus sequence 3 motif, is exemplified by contiguousamino acid residues G262 through L268 of SEQ ID NO: 2, and has theconsensus sequence Gly/Val/Arg-Asp/Glu/His-Arg/Glu/Gln-Leu-Leu/Val-Val.A fourth region, exemplified by contiguous amino acid residues P213through R238 of SEQ ID NO: 2, has at least one phosphorylation siteflanked by the consensus sequences Pro-Asp/Glu-Pro andLys-Lys/Arg-Arg/Lys-Lys/Arg and the phosphorylation site is potentiallyphosphorylated by at least one isozyme of protein kinase C, proteinkinase A, protein kinase G, casein kinase II, or Pto kinase.

The AP2 domains of eleven TDR polypeptide sequences comprise a consensussequence ofGly-Lys-His-Tyr-Arg-Gly-Val-Arg-Gln/Arg-Arg-Pro-Trp-Gly-Lys/Glu-Phe-Ala-Ala-Glu-Ile-Arg-Asp-Pro-Ala-Lys/Arg-Asn-Gly-Ala-Arg-Val-Trp-Leu/His-Gly-Thr-Phe/Tyr-Asp/Glu-Thr/Ser-Ala/Asp-Glu-Asp/Glu-Ala-Ala-Leu/Val/Ile-Ala-Tyr-Asp-Arg/Lys/Re-Ala-Ala-Phe/Tyr-Arg-Met/Arg-Arg-Gly-ser-Arg/Lys-Ala-Leu/His-Leu-Asn-Phe-Pro-Leu/His-Arg-Val/Ile-Asn/Gly-Ser/Leu-Gly/lu/Asn-Glu/Asp/Ile-Pro.

The G28 lade is distinguished by, for example, an AP2 domain, anarginine residue at a position corresponding to position 222 of SEQ IDNO: 2, and the ability to confer disease tolerance or resistance inplants. In this context, “corresponding position” refers to a similar orthe same position in an alignment of two similar or identicalsubsequences of distinct G28 lade polypeptides. The sequences thatappear in an alignment of polypeptides such as that found in FIGS. 3A-3G(for the present discussion, R222 of G28 and residues in the same cladeand column in FIG. 3D) may be used to determine corresponding residues.It will be recognized by those skilled in the art that similarsubstitutions, such as those identified in Table 5, may be made tocorresponding residues in polypeptides that retain the function of theunsubstituted molecule.

The G3430 subclade of the G28 clade of transcription factors includesthe monocot-derived sequences within the G28 lade. The G3430 subclademay be distinguished by the presence of a Motif Y, a 17 amino acidresidue that is substantially identical to SEQ ID NO: 55.

Therefore, the invention provides tdr polynucleotides comprising SEQ IDNO: 1, paralogs, orthologs, and/or equivalog sequences and encoding TDRpolypeptides that are members of the G28 lade of transcription factorpolypeptides. The polynucleotides are shown to have strong differentialexpression associated with response to plant pathogen exposure. Theinvention also encompasses a complement of the polynucleotides. Thepolynucleotides are useful for screening libraries of molecules orcompounds for specific binding and for creating transgenic plants havingincreased tolerance to pathogens.

Additional polynucleotides of the invention were identified by screeningArabidopsis thaliana and/or other plant cDNA libraries with probescorresponding to known transcription factors under low stringencyhybridization conditions. Additional sequences, including full lengthcoding sequences, were subsequently recovered by the rapid amplificationof cDNA ends (RACE) procedure, using a commercially available kitaccording to the manufacturer's instructions. Where necessary, multiplerounds of RACE were performed to isolate 5′ and 3′ ends. The full-lengthcDNA was then recovered by a routine end-to-end polymerase chainreaction (PCR) using primers specific to the isolated 5′ and 3′ ends.Exemplary sequences are provided in the Sequence Listing.

The polynucleotides are particularly useful when they are hybridizablearray elements in a microarray. Such a microarray can be employed tomonitor the expression of genes that are differentially expressed innormal, diseased, or callous tissues. The microarray can be used inlarge scale genetic or gene expression analysis of a large number ofpolynucleotides; in the diagnosis of plant diseases or disorders beforephenotypic symptoms are evident. Furthermore, the microarray can beemployed to investigate cellular responses, such as cell proliferation,transformation, and the like. The array elements may be organized in anordered fashion so that each element is present at a specified locationon the substrate. Because the array elements are at specified locationson the substrate, the hybridization patterns and intensities (thattogether create a unique expression profile) can be interpreted in termsof expression levels of particular genes and can be correlated with aparticular disease, pathology, or treatment.

The invention also entails an agronomic composition comprising apolynucleotide of the invention in conjunction with a suitable carrierand a method for altering a plants trait using the composition.

The invention also encompasses transcription factor polypeptides thatcomprise SEQ ID NO: 55, or a motif that is substantially identical toSEQ ID NO: 55, and have substantially similar activity with that of SEQID NO: 2. For example, SEQ ID NO: 10 and SEQ ID NO: 12 include thesubsequence:

Ser Phe Gly Ser Leu Val Ala Asp Gln Trp Ser Xaa Ser Leu Pro Phe Arg

where Xaa represents any naturally occurring amino acid residue.

Transcription factor polypeptides that comprise SEQ ID NO: 55 or a motifthat is substantially identical to SEQ ID NO: 55, and that havesubstantially similar functions as G28 or G3430 in conferring diseasetolerance or resistance in plants when overexpressed, are intended tofall within the scope of the invention.

Additional monocot ortholog sequences identified using conservation tomotif Y. As a conserved motif found in two monocot orthologs of SEQ IDNO: 2, motif Y was used to identify additional monocot orthologs of SEQID G28. Motif Y was used in a TBLASTN search against all plantnucleotide sequences in GenBank. A significant number of monocotsequences were found that had a minimum of 14 identical residues to the17 residue Motif Y of SEQ ID NO: 55 (Table 2). Monocot sequences werethe only sequences found in this analysis; no dicot Motif Y-likesequences were identified, even allowing for three mismatches to SEQ IDNO: 55. Upon translation of these nucleotide sequences in a frame thatprovided the identified conserved motif, all the resulting proteinsequences were found to have a conserved AP2 binding domain in theexpected location. The protein sequences having a conserved AP2 bindingdomain in the expected location were aligned with the previously alignedset of AP2 sequences, and a neighbor-joining algorithm was used togenerate a phylogenetic tree, as described above. In this tree, theadditional sequences identified through Motif Y all were found withinthe G28 clade identified previously, indicating that Motif Y wassuccessfully used to identify new monocot orthologs of G28, listed inTable 2. TABLE 2 Published Sequences that Comprise Subsequences HighlySimilar to Motif Y, SEQ ID NO: 55 Percent Identity GenBank to SEQ ID NO:Accession No. Species Motif Y Sequence 55 AU057740 Oryza sativaSFGSLVADQWSESLPFR 100%  AX573798 Oryza sativa SFGSLVADQWSESLPFR 100% AX653155 Oryza sativa SFGSLVADQWSESLPFR 100%  AK105940 Oryza sativaSFGSLVADQWSESLPFR 100%  AK073812 Oryza sativa SFGSLVADQWSESLPFR 100% AJ307662 Oryza sativa SFGSLVADQWSESLPFR 100%  CB653231 Oryza sativaSFGSLVADQWSESLPFR 100%  AP004676 Oryza sativa (japonica cultivar-group)SFGSLVADQWSESLPFR 100%  AAAA01012531 Oryza sativa (indicacultivar-group) SFGSLVADQWSESLPFR 100%  CL163362 Sorghum bicolorSFGSLVADQWSESLPFR 100%  CD211509 Sorghum bicolor SFGSLVADQWSESLPFR 100% CN130468 Sorghum bicolor SFGSLVADQWSESLPFR 100%  BF705208 Sorghumpropinquum SFGSLVADQWSESLPFR 100%  AL821943 Triticum aestivumSFGSLVADQWSESLPFR 100%  CK195316 Triticum aestivum SFGSLVADQWSESLPFR100%  CN012725 Triticum aestivum SFGSLVADQWSESLPFR 100%  CN011872Triticum aestivum SFGSLVADQWSESLPFR 100%  CN010562 Triticum aestivumSFGSLVADQWSESLPFR 100%  CA741180 Triticum aestivum SFGSLVADQWSESLPFR100%  BE427897 Triticum turgidum subsp. Durum SFGSLVADQWSESLPFR 100% CA004558 Hordeum vulgare subsp. vulgare SFGSLVADQWSESLPFR 100%  BQ467769Hordeum vulgare subsp. vulgare SFGSLVADQWSESLPFR 100%  CG333070 Zea maysSFGSLVADQWSESLPFR 100%  CF626193 Zea mays SFGSLVADQWSESLPFR 100% CG355473 Zea mays SFGSLVADQWSESLPFR 100%  CC702573 Zea maysSFGSLVADQWSESLPFR 100%  CA121404 Saccharum officinarum SFGSLVADQWSGSLPFR94% CA141374 Saccharum officinarum SFGSLVADQWSGSLPFR 94% BQ537427Saccharum officinarum SFGSLVADQWSGSLPFR 94% CA121403 Saccharumofficinarum SFGSLVADQWSGSLPFR 94% AW680814 Sorghum bicolorSFGSLVADQWSGSLPFR 94% BG357344 Sorghum bicolor SFGSLVADQWSGSLPFR 94%BG948711 Sorghum bicolor SFGSLVADQWSGSLPFR 94% CG283767 Zea maysSFGSLVADQWSGSLPFR 94% CG239914 Zea mays SFGSLVADQWSGSLPFR 94% CB661210Oryza sativa SFGSLVADMWSDSLPFR 88% CB670319 Oryza sativaSFGSLVADMWSDSLPFR 88% CB670372 Oryza sativa SFGSLVADMWSDSLPFR 88%CB641135 Oryza sativa SFGSLVADMWSDSLPFR 88% AL607006 Oryza sativa(japonica cultivar-group) SFGSLVADMWSDSLPFR 88% AU197778 Oryza sativa(japonica cultivar-group) SFGSLVADMWSDSLPFR 88% AX654311 Oryza sativaSFGSLVADMWSDSLPFR 88% CB666299 Oryza sativa SFGSLVADMWSDSLPFR 88%CB675534 Oryza sativa SFGSLVADMWSDSLPFR 88% CB660138 Oryza sativaSFGSLVADMWSDSLPFR 88% D23520 Oryza sativa (japonica cultivar-group)SFGSLVADMWSDSLPFR 88% AAAA01003158 Oryza sativa (indica cultivar-group)SFGSLVADMWSDSLPFR 88% C25163 Oryza sativa (japonica cultivar-group)SFGSLVADMWSXSLPFR 88% CN145823 Sorghum bicolor SFGSLAADQWSGSLPFR 88%CG261750 Zea mays SFGILVADQWSDSLPFR 88% CG230966 Zea maysSFGILVADQWSDSLPFR 88% CG230975 Zea mays SFGILVADQWSDSLPFR 88% CG233760Zea mays SFGILVADQWSDSLPFR 88% CB673022 Oryza sativa SFWSLVADMWSDSLPFR82%

The correlation between the conserved structural element Motif Y anddisease resistance-conferring transcription factors in monocots isstriking and, as determined thus far, absolute; Motif Y was alwayspresent in monocots nearer the N-terminus than the AP2 domain, but neverfound in dicots. Motif Y is associated with transcription factors thatare part of a lade of AP2 transcription factors known to confer diseaseresistance, and is thus highly likely to be involved in the diseaseresistance function of these transcription factors in monocots. Table 2,which shows a number of sequences found to contain a Motif Y, includessequences discovered in cDNA libraries from wheat plants challenged withFusarium graminearum (Kruger et al. (2004) NCBI accession numbersCN011872, CN010562 and CN012725). These libraries contained genes ofboth fungal and plant origin. The authors of these reports appear tohave discovered, without identifying a specific function, AP2transcription factors that contain a Motif Y. The function of thesesequences that are apparently produced during fungal challenge is likelyattributable to an inducible disease tolerance mechanism. Because of thecorrelation of Motif Y and disease tolerance-associated transcriptionfactors in monocots, Motif Y is likely to be required for, or toenhance, the up-regulation of pathways involved in conferring diseasetolerance or resistance in monocots, a hypothesis that may readily betested for each monocot plant in which Motif Y is found.

Producing Polypeptides. The polynucleotides of the invention includesequences that encode transcription factors and transcription factorhomolog polypeptides and sequences complementary thereto, as well asunique fragments of coding sequence, or sequence complementary thereto.Such polynucleotides can be, e.g., DNA or RNA, e.g., mRNA, cRNA,synthetic RNA, genomic DNA, cDNA synthetic DNA, oligonucleotides, etc.The polynucleotides are either double-stranded or single-stranded, andinclude either, or both sense (i.e., coding) sequences and antisense(i.e., non-coding, complementary) sequences. The polynucleotides includethe coding sequence of a transcription factor, or transcription factorhomolog polypeptide, in isolation, in combination with additional codingsequences (e.g., a purification tag, a localization signal, as afusion-protein, as a pre-protein, or the like), in combination withnon-coding sequences (e.g., introns or inteins, regulatory elements suchas promoters, enhancers, terminators, and the like), and/or in a vectoror host environment in which the polynucleotide encoding a transcriptionfactor or transcription factor homolog polypeptide is an endogenous orexogenous gene.

A variety of methods exist for producing the polynucleotides of theinvention. Procedures for identifying and isolating DNA clones are wellknown to those of skill in the art, and are described in, e.g., Bergerand Kimmel (1987) Guide to Molecular Cloning Techniques, Methods inEnzymology, vol. 152 Academic Press, Inc., San Diego, Calif.; Sambrooket al. (1989) supra, and Ausubel et al. editors, (supplemented through2000) Current Protocols in Molecular Biology, Current Protocols, a jointventure between Greene Publishing Associates, Inc. and John Wiley &Sons, Inc.

Alternatively, polynucleotides of the invention, can be produced by avariety of in vitro amplification methods adapted to the presentinvention by appropriate selection of specific or degenerate primers.Examples of protocols sufficient to direct persons of skill through invitro amplification methods, including the polymerase chain reaction(PCR) the ligase chain reaction (LCR), Qβ-replicase amplification andother RNA polymerase mediated techniques (e.g., NASBA), e.g., for theproduction of the homologous nucleic acids of the invention are found inBerger (1987) supra, Sambrook et al. (1989) supra), and Ausubel (2000)supra), as well as Mullis et al. (1990) PCR Protocols A Guide to Methodsand Applications (Innis et al. eds) Academic Press Inc. San Diego,Calif. Improved methods for cloning in vitro amplified nucleic acids aredescribed in Wallace et al. U.S. Pat. No. 5,426,039. Improved methodsfor amplifying large nucleic acids by PCR are summarized in Cheng et al.(1994) Nature 369: 684-685 and the references cited therein, in whichPCR amplicons of up to 40 kb are generated. One of skill will appreciatethat essentially any RNA can be converted into a double stranded DNAsuitable for restriction digestion, PCR expansion and sequencing usingreverse transcriptase and a polymerase. See, e.g., Ausubel (2000) supra,Sambrook et al. (1989) supra, and Berger (1987) supra.

Alternatively, polynucleotides and oligonucleotides of the invention canbe assembled from fragments produced by solid-phase synthesis methods.Typically, fragments of up to approximately 100 bases are individuallysynthesized and then enzymatically or chemically ligated to produce adesired sequence, e.g:, a polynucleotide encoding all or part of atranscription factor. For example, chemical synthesis using thephosphoramidite method is described (e.g., by Beaucage et al. (1981)Tetrahedron Letters 22: 1859-1869; and Matthes et al. (1984) EMBO J. 3:801-805). According to such methods, oligonucleotides are synthesized,purified, annealed to their complementary strand, ligated and thenoptionally cloned into suitable vectors. And if so desired, thepolynucleotides and polypeptides of the invention can be custom orderedfrom any of a number of commercial suppliers.

Homologous Sequences. Sequences homologous to those provided in theSequence Listing, derived from Arabidopsis thaliana or from other plantsof choice, are also an aspect of the invention. Homologous sequences canbe derived from any plant including monocots and dicots and inparticular agriculturally important plant species, including but notlimited to, crops such as soybean, wheat, corn (maize), potato, cotton,rice, rape, oilseed rape (including canola), sunflower, alfalfa, clover,sugarcane, and turf; or fruits and vegetables, such as banana,blackberry, blueberry, strawberry, and raspberry, cantaloupe, carrot,cauliflower, coffee, cucumber, eggplant, grapes, honeydew, lettuce,mango, melon, onion, papaya, peas, peppers, pineapple, pumpkin, spinach,squash, sweet corn, tobacco, tomato, tomatillo, watermelon, rosaceousfruits (such as apple, peach, pear, cherry and plum) and vegetablebrassicas (such as broccoli, cabbage, cauliflower, Brussels sprouts, andkohlrabi). Other crops, including fruits and vegetables, whose phenotypecan be changed and that comprise homologous sequences include barley;rye; millet; sorghum; currant; avocado; citrus fruits such as oranges,lemons, grapefruit and tangerines, artichoke, cherries; nuts such as thewalnut and peanut; endive; leek; roots such as arrowroot, beet, cassaya,turnip, radish, yam, and sweet potato; and beans. The homologoussequences may also be derived from woody species, such pine, poplar andeucalyptus, or mint or other labiates. In addition, homologous sequencesmay be derived from plants that are evolutionarily-related to cropplants, but which may not have yet been used as crop plants. Examplesinclude deadly nightshade (Atropa belladona), related to tomato; jimsonweed (Datura strommium), related to peyote; and teosinte (Zea species),related to corn (maize).

Orthologs and Paralogs. Homologous sequences as described above cancomprise orthologous or paralogous sequences. Several different methodsare known by those of skill in the art for identifying and definingthese functionally homologous sequences. Three general methods fordefining orthologs and paralogs are described. Orthologs, paralogs, orequivalogs may be identified by one or more of the methods describedbelow.

Orthologs and paralogs are evolutionarily related genes that havesimilar sequence and similar functions. Orthologs are structurallyrelated genes in different species that are derived by a speciationevent. Paralogs are structurally related genes within a single speciesthat are derived by a duplication event.

Within a single plant species, gene duplication may cause two copies ofa particular gene, giving rise to two or more genes with similarsequence and often similar function known as paralogs. A paralog istherefore a similar gene formed by duplication within the same species.Paralogs typically cluster together or in the same clade (a group ofsimilar genes) when a gene family phylogeny is analyzed using programssuch as CLUSTAL (Thompson et al. (1994) Nucleic Acids Res. 22:4673-4680; Higgins et al. (1996) Methods Enzymol. 266: 383-402). Groupsof similar genes can also be identified with pair-wise BLAST analysis(Feng and Doolittle (1987) J. Mol. Evol. 25: 351-360). For example, alade of very similar MADS domain transcription factors from Arabidopsisall share a common function in flowering time (Ratcliffe et al. (2001)Plant Physiol. 126: 122-132), and a group of very similar AP2 domaintranscription factors from Arabidopsis are involved in tolerance ofplants to freezing (Gilmour et al. (1998) Plant J. 16: 433-442).Analysis of groups of similar genes with similar function that fallwithin one lade can yield sub-sequences that are particular to the lade.These sub-sequences, known as consensus sequences, can not only be usedto define the sequences within each lade, but define the functions ofthese genes; genes within a clade may contain paralogous sequences, ororthologous sequences that share the same function (see also, forexample, Mount (2001), in Bioinformatics: Sequence and Genome Analysis,Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., page543).

Speciation, the appearance of new species from a parental species, canalso give rise to two or more genes with similar sequence and similarfunction. These genes, termed orthologs, often have an identicalfunction within their host plants and are often interchangeable betweenspecies without losing function. Because plants have common ancestors,many genes in any plant species will have a corresponding orthologousgene in another plant species. Once a phylogenic tree for a gene familyof one species has been constructed using a program such as CLUSTAL(Thompson et al. (1994) Nucleic Acids Res. 22: 4673-4680; Higgins et al.(1996) supra) potential orthologous sequences can be placed into thephylogenetic tree and their relationship to genes from the species ofinterest can be determined. Orthologous sequences can also be identifiedby a reciprocal BLAST strategy. Once an orthologous sequence has beenidentified, the function of the ortholog can be deduced from theidentified function of the reference sequence.

Transcription factor gene sequences are conserved across diverseeukaryotic species lines (Goodrich et al. (1993) Cell 75: 519-530; Linet al. (1991) Nature 353: 569-571; Sadowski et al. (1988) Nature 335:563-564). Plants are no exception to this observation; diverse plantspecies possess transcription factors that have similar sequences andfunctions.

Orthologous genes from different organisms have highly conservedfunctions, and very often essentially identical functions (Lee et al.(2002) Genome Res. 12: 493-502; Remm et al. (2001) J. Mol. Biol. 314:1041-1052). Paralogous genes, which have diverged through geneduplication, may retain similar functions of the encoded proteins. Insuch cases, paralogs can be used interchangeably with respect to certainembodiments of the instant invention (for example, transgenic expressionof a coding sequence). An example of such highly related paralogs is theCBF family, with three well-defined members in Arabidopsis and at leastone ortholog in Brassica napus (SEQ ID NOs: 46, 48, 50, or 52,respectively), all of which control pathways involved in both freezingand drought stress (Gilmour et al. (1998) Plant J. 16: 433-442; Jaglo etal. (1998) Plant Physiol. 127: 910-917).

The following references represent a small sampling of the many studiesthat demonstrate that conserved transcription factor genes from diversespecies are likely to function similarly (i.e., regulate similar targetsequences and control the same traits), and that transcription factorsmay be transformed into diverse species to confer or improve traits.

-   -   (1) The Arabidopsis NPR1 gene regulates systemic acquired        resistance (SAR); over-expression of NPR1 leads to enhanced        resistance in Arabidopsis. When either Arabidopsis NPR1 or the        rice NPR1 ortholog was overexpressed in rice (which, as a        monocot, is diverse from Arabidopsis), challenge with the rice        bacterial blight pathogen Xanthomonas oryzae pv. oryzae, the        transgenic plants displayed enhanced resistance (Chem et        al. (2001) Plant J 27: 101-113). NPR1 acts through activation of        expression of transcription factor genes, such as TGA2 (Fan and        Dong (2002) Plant Cell 14: 1377-1389).    -   (2) E2F genes are involved in transcription of plant genes for        proliferating cell nuclear antigen (PCNA). Plant E2Fs share a        high degree of similarity in amino acid sequence between        monocots and dicots, and are even similar to the conserved        domains of the animal E2Fs. Such conservation indicates a        functional similarity between plant and animal E2Fs. E2F        transcription factors that regulate meristem development act        through common cis-elements, and regulate related (PCNA) genes        (Kosugi and Ohashi, (2002) Plant J. 29: 45-59).    -   (3) The ABI5 gene (ABA Insensitive 5) encodes a basic leucine        zipper factor required for ABA response in the seed and        vegetative tissues. Co-transformation experiments with ABI5 cDNA        constructs in rice protoplasts resulted in specific        transactivation of the ABA-inducible wheat, Arabidopsis, bean,        and barley promoters. These results demonstrate that        sequentially similar ABI5 transcription factors are key targets        of a conserved ABA signaling pathway in diverse plants (Gampala        et al. (2001) J. Biol. Chem. 277: 1689-1694).    -   (4) Sequences of three Arabidopsis GAMYB-like genes were        obtained on the basis of sequence similarity to GAMYB genes from        barley, rice, and L. temulentum. These three Arabadopsis genes        were determined to encode transcription factors (AtMYB33,        AtMYB65, and AtMYB101) and could substitute for a barley GAMYB        and control alpha-amylase expression (Gocal et al. (2001) Plant        Physiol. 127: 1682-1693).    -   (5) The floral control gene LEAFY from Arabidopsis can        dramatically accelerate flowering in numerous dictoyledonous        plants. Constitutive expression of Arabidopsis LEAFY also caused        early flowering in transgenic rice (a monocot), with a heading        date that was 26-34 days earlier than that of wild-type plants.        These observations indicate that floral regulatory genes from        Arabidopsis are useful tools for heading date improvement in        cereal crops (He et al. (2000) Transgenic Res. 9: 223-227).    -   (6) Bioactive gibberellins (GAs) are essential endogenous        regulators of plant growth. GA signaling tends to be conserved        across the plant kingdom. GA signaling is mediated via GAI, a        nuclear member of the GRAS family of plant transcription        factors. Arabidopsis GAI has been shown to function in rice to        inhibit gibberellin response pathways (Fu et al. (2001) Plant        Cell 13: 1791-1802).    -   (7) The Arabidopsis gene SUPERMAN (SUP), encodes a putative        transcription factor that maintains the boundary between stamens        and carpels. By over-expressing Arabidopsis SUP in rice, the        effect of the gene's presence on whorl boundaries was shown to        be conserved. This demonstrated that SUP is a conserved        regulator of floral whorl boundaries and affects cell        proliferation (Nandi et al. (2000) Curr. Biol. 10: 215-218).    -   (8) Maize, petunia and Arabidopsis myb transcription factors        that regulate flavonoid biosynthesis are very genetically        similar and affect the same trait in their native species,        therefore sequence and function of these myb transcription        factors correlate with each other in these diverse species        (Borevitz et al. (2000) Plant Cell 12: 2383-2394).    -   (9) Wheat reduced height-1 (Rht-B1/Rht-D1) and maize dwarf-8        (d8) genes are orthologs of the Arabidopsis gibberellin        insensitive (GAI) gene. Both of these genes have been used to        produce dwarf grain varieties that have improved grain yield.        These genes encode proteins that resemble nuclear transcription        factors and contain an SH2-like domain, indicating that        phosphotyrosine may participate in gibberellin signaling.        Transgenic rice plants containing a mutant GAI allele from        Arabidopsis have been shown to produce reduced responses to        gibberellin and are dwarfed, indicating that mutant GAI        orthologs could be used to increase yield in a wide range of        crop species (Peng et al. (1999) Nature 400: 256-261).

Transcription factors that are homologous to the listed sequences willtypically share, in at least one conserved domain, at least about 75%amino acid sequence identity. At the nucleotide level, the sequenceswill typically share at least about 50% nucleotide sequence identity ormore sequence identity to one or more of the listed sequences. Thedegeneracy of the genetic code enables major variations in thenucleotide sequence of a polynucleotide while maintaining the amino acidsequence of the encoded protein.

Percent identity can be determined electronically, e.g., by using theMEGALIGN program (DNASTAR, Inc. Madison, Wis.). The MEGALIGN program cancreate alignments between two or more sequences according to differentmethods, for example, the clustal method. (See, for example, Higgins andSharp (1988) Gene 73: 237-244.) The clustal algorithm groups sequencesinto clusters by examining the distances between all pairs. The clustersare aligned pairwise and then in groups. Other alignment algorithms orprograms may be used, including FASTA, BLAST, or ENTREZ, FASTA andBLAST, and that may be used to calculate percent similarity. These areavailable as a part of the GCG sequence analysis package (University ofWisconsin, Madison, Wis.), and can be used with or without defaultsettings. ENTREZ is available through the National Center forBiotechnology Information. In one embodiment, the percent identity oftwo sequences can be determined by the GCG program with a gap weight of1, e.g., each amino acid gap is weighted as if it were a single aminoacid or nucleotide mismatch between the two sequences (see U.S. Pat. No.6,262,333).

Other techniques for alignment are described in Doolittle, R. F. (1996)Methods in Enzymology: Computer Methods for Macromolecular SequenceAnalysis, vol. 266, Academic Press, Orlando, Fla., USA. Preferably, analignment program that permits gaps in the sequence is utilized to alignthe sequences. The Smith-Waterman is one type of algorithm that permitsgaps in sequence alignments (see Shpaer (1997) Methods Mol. Biol. 70:173-187). Also, the GAP program using the Needleman and Wunsch alignmentmethod can be utilized to align sequences. An alternative searchstrategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCHuses a Smith-Waterman algorithm to score sequences on a massivelyparallel computer. This approach improves ability to pick up distantlyrelated matches, and is especially tolerant of small gaps and nucleotidesequence errors. Nucleic acid-encoded amino acid sequences can be usedto search both protein and DNA databases.

The percentage similarity between two polypeptide sequences, e.g.,sequence A and sequence B, is calculated by dividing the length ofsequence A, minus the number of gap residues in sequence A, minus thenumber of gap residues in sequence B, into the sum of the residuematches between sequence A and sequence B, times one hundred. Gaps oflow or of no similarity between the two amino acid sequences are notincluded in determining percentage similarity. Percent identity betweenpolynucleotide sequences can also be counted or calculated by othermethods known in the art, e.g., the Jotun Hein method. (See, forexample, Hein (1990) Methods Enzymol. 183: 626-645.) Identity betweensequences can also be determined by other methods known in the art,e.g., by varying hybridization conditions (see US Patent Application No.20010010913).

Thus, the invention provides methods for identifying a sequence similaror paralogous or orthologous or homologous to one or morepolynucleotides as noted herein, or one or more target polypeptidesencoded by the polynucleotides, or otherwise noted herein and mayinclude lining or associating a given plant phenotype or gene functionwith a sequence. In the methods, a sequence database is provided(locally or across an internet or intranet) and a query is made againstthe sequence database using the relevant sequences herein and associatedplant phenotypes or gene functions.

In addition, one or more polynucleotide sequences or one or morepolypeptides encoded by the polynucleotide sequences may be used tosearch against a BLOCKS (Bairoch et al. (1997) Nucleic Acids Res. 25:217-221), PFAM, and other databases that contain previously identifiedand annotated motifs, sequences and gene functions. Methods that searchfor primary sequence patterns with secondary structure gap penalties(Smith et al. (1992) Protein Engineering 5: 35-51) as well as algorithmssuch as Basic Local Alignment Search Tool (BLAST; Altschul (1993) J.Mol. Evol. 36: 290-300; Altschul et al. (1990) J. Mol. Biol. 215:403-410), BLOCKS (Henikoff and Henikoff (1991) Nucleic Acids Res. 19:6565-6572), Hidden Markov Models (HMM; Eddy (1996) Curr. Opin. Str.Biol. 6: 361-365; Sonnhammer et al. (1997) Proteins 28: 405-420), andthe like, can be used to manipulate and analyze polynucleotide andpolypeptide sequences encoded by polynucleotides. These databases,algorithms and other methods are well known in the art and are describedin Ausubel et al. (1997; Short Protocols in Molecular Biology, JohnWiley & Sons, New York, N.Y., unit 7.7) and in Meyers (1995; MolecularBiology and Biotechnology, Wiley VCH, New York, N.Y., p 856-853).

Furthermore, methods using manual alignment of sequences similar orhomologous to one or more polynucleotide sequences or one or morepolypeptides encoded by the polynucleotide sequences may be used toidentify regions of similarity and conserved domains. Such manualmethods are well-known of those of skill in the art and can include, forexample, comparisons of tertiary structure between a polypeptidesequence encoded by a polynucleotide that comprises a known functionwith a polypeptide sequence encoded by a polynucleotide sequence thathas a function not yet determined. Such examples of tertiary structuremay comprise predicted alpha helices, beta-sheets, amphipathic helices,leucine zipper motifs, zinc finger motifs, proline-rich regions,cysteine repeat motifs, and the like.

Orthologs and paralogs of presently disclosed transcription factors maybe cloned using compositions provided by the present invention accordingto methods well known in the art. cDNAs may be cloned using mRNA from aplant cell or tissue that expresses one of the present transcriptionfactors. Appropriate mRNA sources may be identified by interrogatingNorthern blots with probes designed from the present transcriptionfactor sequences, after which a library is prepared from the mRNAobtained from a positive cell or tissue. Transcription factor-encodingcDNA is then isolated by, for example, PCR, using primers designed froma presently disclosed transcription factor gene sequence or by probingwith a partial or complete cDNA or with one or more sets of degenerateprobes based on the disclosed sequences. The cDNA library may be used totransform plant cells. Expression of the cDNAs of interest is detectedusing, for example, methods disclosed herein such as microarrays,Northern blots, quantitative PCR, or any other technique for monitoringchanges in expression. Genomic clones may be isolated using similartechniques.

Examples of orthologs encoded by the Arabidopsis tdr polynucleotidesequences (SEQ ID NOs: 1 and 3) and TDR polypeptide sequences (SEQ IDNOs: 2 and 4) include, but are not limited to, SEQ ID NOs: 8, 10, 12,14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42.

Identifying Polynucleotides or Nucleic Acids by Hybridization.Polynucleotides homologous to the sequences illustrated in the SequenceListing and tables can be identified, e.g., by hybridization to eachother under stringent or under highly stringent conditions. Singlestranded polynucleotides hybridize when they associate based on avariety of well characterized physical-chemical forces, such as hydrogenbonding, solvent exclusion, base stacking and the like. The stringencyof a hybridization reflects the degree of sequence identity of thenucleic acids involved, such that the higher the stringency, the moresimilar are the two polynucleotide strands. Stringency is influenced bya variety of factors, including temperature, salt concentration andcomposition, organic and non-organic additives, solvents, etc. presentin both the hybridization and wash solutions and incubations (and numberthereof), as described in more detail in the references cited above.

The invention encompasses polynucleotide sequences capable ofhybridizing to the claimed polynucleotide sequences, including any ofthe transcription factor polynucleotides within the Sequence Listing, orfragments thereof under various conditions of stringency (Wahl andBerger (1987) Methods Enzymol. 152: 399-407; and Kimmel (1987) MethodsEnzymol. 152: 507-511). In addition to the nucleotide sequences listedin the Sequence Listing and Tables, full length cDNA, orthologs, andparalogs of the present nucleotide sequences may be identified andisolated using well-known methods. The cDNA libraries, orthologs, andparalogs of the present nucleotide sequences may be screened usinghybridization methods to determine their utility as hybridization targetor amplification probes.

With regard to hybridization, conditions that are highly stringent, andmeans for achieving them, are well known in the art. See, for example,Sambrook et al. (1989) supra; Berger (1987) supra, pages 467-469; andAnderson and Young (1985) “Quantitative Filter Hybridisation.” In: Hamesand Higgins, ed., Nucleic Acid Hybridisation, A Practical Approach.Oxford, IRL Press, 73-111.

Stability of DNA duplexes is affected by such factors as basecomposition, length, and degree of base pair mismatch. Hybridizationconditions may be adjusted to allow DNAs of different sequencerelatedness to hybridize. The melting temperature (T_(m)) is defined asthe temperature when 50% of the duplex molecules have dissociated intotheir constituent single strands. The melting temperature of a perfectlymatched duplex, where the hybridization buffer contains formamide as adenaturing agent, may be estimated by the following equations:

(I) DNA-DNA:T _(m)(° C.)=81.5+16.6(log [Na+])+0.41(% G+C)−0.62(% formamide)−500/L(I) DNA-RNA:T _(m)(° C.)=79.8+18.5(log [Na+])+0.58(% G+C)+0.12(% G+C)²−0.5(%formamide)−820/L(E) RNA-RNA:T _(m)(° C.)=79.8+18.5(log [Na+])+0.58(% G+C)+0.12(% G+C)²−0.35(%formamide)−820/L

where L is the length of the duplex formed, [Na+] is the molarconcentration of the sodium ion in the hybridization or washingsolution, and % G+C is the percentage of (guanine+cytosine) bases in thehybrid. For imperfectly matched hybrids, approximately 1° C. is requiredto reduce the melting temperature for each 1% mismatch.

Hybridization experiments are generally conducted in a buffer of pHbetween 6.8 to 7.4, although the rate of hybridization is nearlyindependent of pH at ionic strengths likely to be used in thehybridization buffer (Anderson et al. (1985) supra). In addition, one ormore of the following may be used to reduce non-specific hybridization:sonicated salmon sperm DNA or another non-complementary DNA, bovineserum albumin, sodium pyrophosphate, sodium dodecylsulfate (SDS),polyvinyl-pyrrolidone, ficoll and Denhardt's solution. Dextran sulfateand polyethylene glycol 6000 act to exclude DNA from solution, thusraising the effective probe DNA concentration and the hybridizationsignal within a given unit of time. In some instances, conditions ofeven greater stringency may be desirable or required to reducenon-specific and/or background hybridization. These conditions may becreated with the use of higher temperature, lower ionic strength andhigher concentration of a denaturing agent such as formamide.

Stringency conditions can be adjusted to screen for moderately similarfragments such as homologous sequences from distantly related organisms,or to highly similar fragments such as genes that duplicate functionalenzymes from closely related organisms. The stringency can be adjustedeither during the hybridization step or in the post-hybridizationwashes. Salt concentration, formamide concentration, hybridizationtemperature and probe lengths are variables that can be used to alterstringency (as described by the formula above). As a general guideline,high stringency is typically performed at T_(m)−5° C. to T_(m)−20° C.,moderate stringency at T_(m)−20° C. to T_(m)−35° C. and low stringencyat T_(m)−35° C. to T_(m)−50° C. for duplex >150 base pairs.Hybridization may be performed at low to moderate stringency (25-50° C.below T_(m)), followed by post-hybridization washes at increasingstringencies. Maximum rates of hybridization in solution are determinedempirically to occur at T_(m)−25° C. for DNA-DNA duplex and T_(m)−15° C.for RNA-DNA duplex. Optionally, the degree of dissociation may beassessed after each wash step to determine the need for subsequent,higher stringency wash steps.

High stringency conditions may be used to select for nucleic acidsequences with high degrees of identity to the disclosed sequences. Anexample of stringent hybridization conditions obtained in a filter-basedmethod such as a Southern or northern blot for hybridization ofcomplementary nucleic acids that have more than 100 complementaryresidues is about 5° C. to 20° C. lower than the thermal melting point(T_(m)) for the specific sequence at a defined ionic strength and pH.Conditions used for hybridization may include about 0.02 M to about 0.15M sodium chloride, about 0.5% to about 5% casein, about 0.02% SDS orabout 0.1% N-aurylsarcosine, about 0.001 M to about 0.03 M sodiumcitrate, at hybridization temperatures between about 50° C. and about70° C. More preferably, high stringency conditions are about 0.02 Msodium chloride, about 0.5% casein, about 0.02% SDS, about 0.001 Msodium citrate, at a temperature of about 50° C. Nucleic acid moleculesthat hybridize under stringent conditions will typically hybridize to aprobe based on either the entire DNA molecule or selected portions,e.g., to a unique subsequence, of the DNA.

Stringent salt concentration will ordinarily be less than about 750 mMNaCl and 75 mM trisodium citrate. Increasingly stringent conditions maybe obtained with less than about 500 mM NaCl and 50 mM trisodiumcitrate, to even greater stringency with less than about 250 mM NaCl and25 mM trisodium citrate. Low stringency hybridization can be obtained inthe absence of organic solvent, e.g., formamide, whereas high stringencyhybridization may be obtained in the presence of at least about 35%formamide, and more preferably at least about 50% formamide. Stringenttemperature conditions will ordinarily include temperatures of at leastabout 30° C., more preferably of at least about 37° C., and mostpreferably of at least about 42° C. with formamide present. Varyingadditional parameters, such as hybridization time, the concentration ofdetergent, e.g., sodium dodecyl sulfate (SDS) and ionic strength, arewell known to those skilled in the art. Various levels of stringency areaccomplished by combining these various conditions as needed.

The washing steps that follow hybridization may also vary in stringency;the post-hybridization wash steps primarily determine hybridizationspecificity, with the most critical factors being temperature and theionic strength of the final wash solution. Wash stringency can beincreased by decreasing salt concentration or by increasing temperature.Stringent salt concentration for the wash steps will preferably be lessthan about 30 mM NaCl and 3 mM trisodium citrate, and most preferablyless than about 15 mM NaCl and 1.5 mM trisodium citrate.

Thus, hybridization and wash conditions that may be used to bind andremove polynucleotides with less than the desired homology to thenucleic acid sequences or their complements that encode the presenttranscription factors include, for example:

6×SSC at 65° C.;

50% formamide, 4×SSC at 42° C.; or

0.5×SSC, 0.1% SDS at 65° C.;

with, for example, two wash steps of 10-30 minutes each. Usefulvariations on these conditions will be readily apparent to those skilledin the art.

A person of skill in the art would not expect substantial variationamong polynucleotide species encompassed within the scope of the presentinvention because the stringent conditions set forth in the aboveformulae yield structurally similar polynucleotides.

If desired, one may employ wash steps of even greater stringency,including about 0.2×SSC, 0.1% SDS at 65° C. and washing twice, each washstep being about 30 minutes, or about 0.1×SSC, 0.1% SDS at 65° C. andwashing twice for 30 minutes. The temperature for the wash solutionswill ordinarily be at least about 25° C., and for greater stringency atleast about 42° C. Hybridization stringency may be increased further byusing the same conditions as in the hybridization steps, with the washtemperature raised about 3° C. to about 5° C., and stringency may beincreased even further by using the same conditions except the washtemperature is raised about 6° C. to about 9° C. For identification ofless closely related homologs, wash steps may be performed at a lowertemperature, e.g., 50° C.

An example of a low stringency wash step employs a solution andconditions of at least 25° C. in 30 mM NaCl, 3 mM trisodium citrate, and0.1% SDS over 30 minutes. Greater stringency may be obtained at 42° C.in 15 mM NaCl, with 1.5 mM trisodium citrate, and 0.1% SDS over 30minutes. Even higher stringency wash conditions are obtained at 65°C.-68° C. in a solution of 15 mM NaCl, 1.5 mM trisodium citrate, and0.1% SDS. Wash procedures will generally employ at least two final washsteps. Additional variations on these conditions will be readilyapparent to those skilled in the art (see, for example, US PatentApplication No. 20010010913).

Stringency conditions can be selected such that an oligonucleotide thatis fully complementary to the coding oligonucleotide hybridizes to thecoding oligonucleotide with at least about a 5-10× higher signal tonoise ratio than the ratio for hybridization of the perfectlycomplementary oligonucleotide to a nucleic acid encoding a transcriptionfactor known as of the filing date of the application. It may bedesirable to select conditions for a particular assay such that a highersignal to noise ratio, that is, about 15× or more, is obtained.Accordingly, a subject nucleic acid will hybridize to a unique codingoligonucleotide with at least a 2× or greater signal to noise ratio ascompared to hybridization of the coding oligonucleotide to a nucleicacid encoding known polypeptide. The particular signal will depend onthe label used in the relevant assay, e.g., a fluorescent label, acalorimetric label, a radioactive label, or the like. Labeledhybridization or PCR probes for detecting related polynucleotidesequences may be produced by oligolabeling, nick translation,end-labeling, or PCR amplification using a labeled nucleotide.

Encompassed by the invention are polynucleotide sequences that arecapable of hybridizing to the claimed polynucleotide sequences,including, for example, SEQ ID NO: 9 (G3430), the complement of SEQ IDNO: 9, and fragments thereof under stringent conditions (see, e.g., Wahland Berger (1987) Methods Enzymol. 152: 399-407; Kimmel (1987) MethodsEnzymol. 152: 507-511). Estimates of homology are provided by eitherDNA-DNA or DNA-RNA hybridization under conditions of stringency as iswell understood by those skilled in the art (Hames and Higgins, Eds.(1985) Nucleic Acid Hybridisation, IRL Press, Oxford, U.K.). Stringencyconditions can be adjusted to screen for moderately similar fragments,such as homologous sequences from distantly related organisms, to highlysimilar fragments, such as genes that duplicate functional enzymes fromclosely related organisms. Post-hybridization washes determinestringency conditions.

Identifying Polynucleotides or Nucleic Acids with Expression Libraries.In addition to hybridization methods, transcription factor homologpolypeptides can be obtained by screening an expression library usingantibodies specific for one or more transcription factors. With theprovision herein of the disclosed transcription factor, andtranscription factor homolog nucleic acid sequences, the encodedpolypeptide(s) can be expressed and purified in a heterologousexpression system (for example, E. coli) and used to raise antibodies(monoclonal or polyclonal) specific for the polypeptide(s) in question.Antibodies can also be raised against synthetic peptides derived fromtranscription factor, or transcription factor homolog, amino acidsequences. Methods of raising antibodies are well known in the art andare described in Harlow and Lane (1988), Antibodies: A LaboratoryManual, Cold Spring Harbor Laboratory, New York. Such antibodies canthen be used to screen an expression library produced from the plantfrom which it is desired to clone additional transcription factorhomologs, using the methods described above. The selected cDNAs can beconfirmed by sequencing and enzymatic activity.

Sequence Variations. It will readily be appreciated by those of skill inthe art, that any of a variety of polynucleotide sequences are capableof encoding the transcription factors and transcription factor homologpolypeptides of the invention. Due to the degeneracy of the geneticcode, many different polynucleotides can encode identical and/orsubstantially similar polypeptides in addition to those sequencesillustrated in the Sequence Listing. Nucleic acids having a sequencethat differs from the sequences shown in the Sequence Listing, orcomplementary sequences, that encode functionally equivalent peptides(i.e., peptides having some degree of equivalent or similar biologicalactivity) but differ in sequence from the sequence shown in the SequenceListing due to degeneracy in the genetic code, are also within the scopeof the invention.

Altered polynucleotide sequences encoding polypeptides include thosesequences with deletions, insertions, or substitutions of differentnucleotides, resulting in a polynucleotide encoding a polypeptide withat least one functional characteristic of the instant polypeptides.Included within this definition are polymorphisms that may or may not bereadily detectable using a particular oligonucleotide probe of thepolynucleotide encoding the instant polypeptides, and improper orunexpected hybridization to allelic variants, with a locus other thanthe normal chromosomal locus for the polynucleotide sequence encodingthe instant polypeptides.

Allelic variant refers to any of two or more alternative forms of a geneoccupying the same chromosomal locus. Allelic variation arises naturallythrough mutation, and may result in phenotypic polymorphism withinpopulations. Gene mutations can be silent (i.e., no change in theencoded polypeptide) or may encode polypeptides having altered aminoacid sequence. The term allelic variant is also used herein to denote aprotein encoded by an allelic variant of a gene. Splice variant refersto alternative forms of RNA transcribed from a gene. Splice variationarises naturally through use of alternative splicing sites within atranscribed RNA molecule, or less commonly between separatelytranscribed RNA molecules, and may result in several mRNAs transcribedfrom the same gene. Splice variants may encode polypeptides havingaltered amino acid sequence. The term splice variant is also used hereinto denote a protein encoded by a splice variant of an mRNA transcribedfrom a gene.

Those skilled in the art would recognize that, for example, G3430, SEQID NO: 10, represents a single transcription factor; allelic variationand alternative splicing may be expected to occur. Allelic variants ofSEQ ID NO: 9 can be cloned by probing cDNA or genomic libraries fromdifferent individual organisms according to standard procedures. Allelicvariants of the DNA sequence shown in SEQ ID NO: 9, including thosecontaining silent mutations and those in which mutations result in aminoacid sequence changes, are within the scope of the present invention, asare proteins that are allelic variants of SEQ ID NO: 10. cDNAs generatedfrom alternatively spliced mRNAs, which retain the properties of thetranscription factor are included within the scope of the presentinvention, as are polypeptides encoded by such cDNAs and mRNAs. Allelicvariants and splice variants of these sequences can be cloned by probingcDNA or genomic libraries from different individual organisms or tissuesaccording to standard procedures known in the art (see U.S. Pat. No.6,388,064).

Thus, in addition to the sequences set forth in the Sequence Listing(except CBF sequences), the invention also encompasses related nucleicacid molecules that include allelic or splice variants of SEQ ID NOs: 1,3, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41,59, and include sequences that are complementary to any of the abovenucleotide sequences. Related nucleic acid molecules also includenucleotide sequences encoding a polypeptide comprising a substitution,modification, addition and/or deletion of one or more amino acidresidues compared to the polypeptide as set forth in any of SEQ ID NOs:2, 4, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40,42 and 60. Such related polypeptides may comprise, for example,additions and/or deletions of one or more N-linked or O-linkedglycosylation sites, or an addition and/or a deletion of one or morecysteine residues.

For example, Table 3 illustrates, for example, that the codons AGC, AGT,TCA, TCC, TCG, and TCT all encode the same amino acid: serine.Accordingly, at each position in the sequence where there is a codonencoding serine, any of the above trinucleotide sequences can be usedwithout altering the encoded polypeptide. TABLE 3 Codons encoding aminoacids Amino acid Possible Codons Alanine Ala A GCA GCC GCG GCU CysteineCys C TGC TGT Aspartic acid Asp D GAC GAT Glutamic acid Glu E GAA GAGPhenylalanine Phe F TTC TTT Glycine Gly G GGA GGC GGG GGT Histidine HisH CAC CAT Isoleucine Ile I ATA ATC ATT Lysine Lys K AAA AAG Leucine LeuL TTA TTG CTA CTC CTG CTT Methionine Met M ATG Asparagine Asn N AAC AATProline Pro P CCA CCC CCG CCT Glutamine Gln Q CAA CAG Arginine Arg R AGAAGG CGA CGC CGG CGT Serine Ser S AGC AGT TCA TCC TCG TCT Threonine Thr TACA ACC ACG ACT Valine Val V GTA GTC GTG GTT Tryptophan Trp W TGGTyrosine Tyr Y TAC TAT

Sequence alterations that do not change the amino acid sequence encodedby the polynucleotide are termed “silent” variations. With the exceptionof the codons ATG and TGG, encoding methionine and tryptophan,respectively, any of the possible codons for the same amino acid can besubstituted by a variety of techniques, e.g., site-directed mutagenesis,available in the art. Accordingly, any and all such variations of asequence selected from the above table are a feature of the invention.

In addition to silent variations, other conservative variations thatalter one, or a few amino acids in the encoded polypeptide, can be madewithout altering the function of the polypeptide, these conservativevariants are, likewise, a feature of the invention.

For example, substitutions, deletions and insertions introduced into thesequences provided in the Sequence Listing, are also envisioned by theinvention. Such sequence modifications can be engineered into a sequenceby site-directed mutagenesis (Wu, editor; Methods Enzymol. (1993) vol.217, Academic Press) or the other methods noted below. Amino acidsubstitutions are typically of single residues; insertions usually willbe on the order of about from 1 to 10 amino acid residues; and deletionswill range about from 1 to 30 residues. In preferred embodiments,deletions or insertions are made in adjacent pairs, e.g., a deletion oftwo residues or insertion of two residues. Substitutions, deletions,insertions or any combination thereof can be combined to arrive at asequence. The mutations that are made in the polynucleotide encoding thetranscription factor should not place the sequence out of reading frameand should not create complementary regions that could produce secondarymRNA structure. Preferably, the polypeptide encoded by the DNA performsthe desired function.

Conservative substitutions are those in which at least one residue inthe amino acid sequence has been removed and a different residueinserted in its place. Such substitutions generally are made inaccordance with the Table 4 when it is desired to maintain the activityof the protein. In one embodiment, a transcription factors listed in theSequence Listing may have up to ten conservative substitutions andretain their function. In another embodiment, transcription factorslisted in the Sequence Listing may have more than ten conservativesubstitutions and still retain their function. TABLE 4 Conservativesubstitutions of amino acids Conservative Residue Substitutions Ala SerArg Lys Asn Gln; His Asp Glu Gln Asn Cys Ser Glu Asp Gly Pro His Asn;Gln Ile Leu, Val Leu Ile; Val Lys Arg; Gln Met Leu; Ile Phe Met; Leu;Tyr Ser Thr; Gly Thr Ser; Val Trp Tyr Tyr Trp; Phe Val Ile; Leu

Similar substitutions are those in which at least one residue in theamino acid sequence has been removed and a different residue inserted inits place. Such substitutions may be made in accordance with the Table 5when it is desired to maintain the activity of the protein. Table 5shows amino acids that can be substituted for an amino acid in a proteinand that are typically regarded as structural and functionalsubstitutions. For example, a residue in column 1 of Table 5 may besubstituted with a residue in column 2; in addition, a residue in column2 of Table 5 may be substituted with the residue of column 1. TABLE 5Similar substitutions of amino acids Residue Similar Substitutions AlaSer; Thr; Gly; Val; Leu; Ile Arg Lys; His; Gly Asn Gln; His; Gly; Ser;Thr Asp Glu, Ser; Thr Gln Asn; Ala Cys Ser; Gly Glu Asp Gly Pro; Arg HisAsn; Gln; Tyr; Phe; Lys; Arg Ile Ala; Leu; Val; Gly; Met Leu Ala; Ile;Val; Gly; Met Lys Arg; His; Gln; Gly; Pro Met Leu; Ile; Phe Phe Met;Leu; Tyr; Trp; His; Val; Ala Ser Thr; Gly; Asp; Ala; Val; Ile; His ThrSer; Val; Ala; Gly Trp Tyr; Phe; His Tyr Trp; Phe; His Val Ala; Ile;Leu; Gly; Thr; Ser; Glu

Substitutions that are less conservative than those in Table 5 can beselected by picking residues that differ more significantly in theireffect on maintaining (a) the structure of the polypeptide backbone inthe area of the substitution, for example, as a sheet or helicalconformation, (b) the charge or hydrophobicity of the molecule at thetarget site, or (c) the bulk of the side chain. The substitutions thatin general are expected to produce the greatest changes in proteinproperties will be those in which (a) a hydrophilic residue, e.g., serylor threonyl, is substituted for (or by) a hydrophobic residue, e.g.,leucyl, isoleucyl, phenylalanyl, valyl or alanyl; (b) a cysteine orproline is substituted for (or by) any other residue; (c) a residuehaving an electropositive side chain, e.g., lysyl, arginyl, or histidyl,is substituted for (or by) an electronegative residue, e.g., glutamyl oraspartyl; or (d) a residue having a bulky side chain, e.g.,phenylalanine, is substituted for (or by) one not having a side chain,e.g., glycine.

Further Modifying Sequences of the Invention—Mutation/Forced Evolution.In addition to generating silent or conservative substitutions as noted,above, the present invention optionally includes methods of modifyingthe sequences of the Sequence Listing. In the methods, nucleic acid orprotein modification methods are used to alter the given sequences toproduce new sequences and/or to chemically or enzymatically modify givensequences to change the properties of the nucleic acids or proteins.

Thus, in one embodiment, given nucleic acid sequences are modified,e.g., according to standard mutagenesis or artificial evolution methodsto produce modified sequences. The modified sequences may be createdusing purified natural polynucleotides isolated from any organism or maybe synthesized from purified compositions and chemicals using chemicalmeans well know to those of skill in the art. For example, Ausubel(2000) supra, provides additional details on mutagenesis methods.Artificial forced evolution methods are described, for example, byStemmer (1994; Nature 370: 389-391), Stemmer (1994; Proc. Natl. Acad.Sci. USA 91: 10747-10751), and U.S. Pat. Nos. 5,811,238, 5,837,500, and6,242,568. Methods for engineering synthetic transcription factors andother polypeptides are described, for example, by Zhang et al. (2000) J.Biol. Chem. 275: 33850-33860, Liu et al. (2001) J. Biol. Chem. 276:11323-11334, and Isalan et al. (2001) Nature Biotechnol. 19: 656-660.Many other mutation and evolution methods are also available andexpected to be within the skill of the practitioner.

Similarly, chemical or enzymatic alteration of expressed nucleic acidsand polypeptides can be performed by standard methods. For example,sequence can be modified by addition of lipids, sugars, peptides,organic or inorganic compounds, by the inclusion of modified nucleotidesor amino acids, or the like. For example, protein modificationtechniques are illustrated in Ausubel (2000) supra. Further details onchemical and enzymatic modifications can be found herein. Thesemodification methods can be used to modify any given sequence, or tomodify any sequence produced by the various mutation and artificialevolution modification methods noted herein.

Accordingly, the invention provides for modification of any givennucleic acid by mutation, evolution, chemical or enzymatic modification,or other available methods, as well as for the products produced bypracticing such methods, e.g., using the sequences herein as a startingsubstrate for the various modification approaches.

For example, optimized coding sequence containing codons preferred by aparticular prokaryotic or eukaryotic host can be used e.g., to increasethe rate of translation or to produce recombinant RNA transcripts havingdesirable properties, such as a longer half-life, as compared withtranscripts produced using a non-optimized sequence. Translation stopcodons can also be modified to reflect host preference. For example,preferred stop codons for Saccharomyces cerevisiae and mammals are TAAand TGA, respectively. The preferred stop codon for monocotyledonousplants is TGA, whereas insects and E. coli prefer to use TAA as the stopcodon.

The polynucleotide sequences of the present invention can also beengineered in order to alter a coding sequence for a variety of reasons,including but not limited to, alterations that modify the sequence tofacilitate cloning, processing and/or expression of the gene product.For example, alterations are optionally introduced using techniques thatare well known in the art, e.g., site-directed mutagenesis, to insertnew restriction sites, to alter glycosylation patterns, to change codonpreference, to introduce splice sites, etc.

Furthermore, a fragment or domain derived from any of the polypeptidesof the invention can be combined with domains derived from othertranscription factors or synthetic domains to modify the biologicalactivity of a transcription factor. For instance, a DNA-binding domainderived from a transcription factor of the invention can be combinedwith the activation domain of another transcription factor or with asynthetic activation domain. A transcription activation domain assistsin initiating transcription from a DNA-binding site. Examples includethe transcription activation region of VP16 or GAL4 (Moore et al. (1998)Proc. Natl. Acad. Sci. USA 95: 376-381; Aoyama et al. (1995) Plant Cell7: 1773-1785), peptides derived from bacterial sequences (Ma and Ptashne(1987) Cell 51: 113-119) and synthetic peptides (Giniger and Ptashne(1987) Nature 330: 670-672).

Expression and Modification of Polypeptides. Typically, polynucleotidesequences of the invention are incorporated into recombinant DNA (orRNA) molecules that direct expression of polypeptides of the inventionin appropriate host cells, transgenic plants, in vitro translationsystems, or the like. Due to the inherent degeneracy of the geneticcode, nucleic acid sequences that encode substantially the same or afunctionally equivalent amino acid sequence can be substituted for anylisted sequence to provide for cloning and expressing the relevanthomolog.

The transgenic plants of the present invention comprising recombinantpolynucleotide sequences are generally derived from parental plants,which may themselves be non-transformed (or non-transgenic) plants.These transgenic plants may either have a transcription factor gene“knocked out” (for example, with a genomic insertion by homologousrecombination, an antisense or ribozyme construct) or expressed to anormal or wild-type extent. However, overexpressing transgenic “progeny”plants will exhibit greater mRNA levels, wherein the mRNA encodes atranscription factor, that is, a DNA-binding protein that is capable ofbinding to a DNA regulatory sequence and inducing transcription, andpreferably, expression of a plant trait gene. Preferably, the mRNAexpression level will be at least three-fold greater than that of theparental plant, or more preferably at least ten-fold greater mRNA levelscompared to said parental plant, and most preferably at least fifty-foldgreater compared to said parental plant.

Vectors Promoters, and Expression Systems. The present inventionincludes recombinant constructs comprising one or more of the nucleicacid sequences herein. The constructs typically comprise a vector, suchas a plasmid, a cosmid, a phage, a virus (e.g., a plant virus), abacterial artificial chromosome (BAC), a yeast artificial chromosome(YAC), or the like, into which a nucleic acid sequence of the inventionhas been inserted, in a forward or reverse orientation. In a preferredaspect of this embodiment, the construct further comprises regulatorysequences, including, for example, a promoter, operably linked to thesequence. Large numbers of suitable vectors and promoters are known tothose of skill in the art, and are commercially available.

General texts that describe molecular biological techniques usefulherein, including the use and production of vectors, promoters and manyother relevant topics, include Berger (1987) supra, Sambrook et al.(1989) supra, and Ausubel (2000) supra. Any of the identified sequencescan be incorporated into a cassette or vector, e.g., for expression inplants. A number of expression vectors suitable for stabletransformation of plant cells or for the establishment of transgenicplants have been described including those described in Weissbach andWeissbach (1989) Methods for Plant Molecular Biology, Academic Press,and Gelvin et al. (1990) Plant Molecular Biology Manual, Kluwer AcademicPublishers. Specific examples include those derived from a Ti plasmid ofAgrobacterium tumefaciens, as well as those disclosed byHerrera-Estrella et al. (1983) Nature 303: 209, Bevan (1984) NucleicAcids Res. 12: 8711-8721, Klee (1985) Bio/Technology 3: 637-642, fordicotyledonous plants.

Alternatively, non-Ti vectors can be used to transfer the DNA intomonocotyledonous plants and cells by using free DNA delivery techniques.Such methods can involve, for example, the use of liposomes,electroporation, microprojectile bombardment, silicon carbide whiskers,and viruses. By using these methods transgenic plants such as wheat,rice (Christou (1991) Bio/Technology 9: 957-962) and corn (Gordon-Kamm(1990) Plant Cell 2: 603-618) can be produced. An immature embryo canalso be a good target tissue for monocots for direct DNA deliverytechniques by using the particle gun (Weeks et al. (1993) Plant Physiol.102: 1077-1084; Vasil (1993) Bio/Technology 10: 667-674; Wan and Lemeaux(1994) Plant Physiol. 104: 37-48, and for Agrobacterium-mediated DNAtransfer (Ishida et al. (1996) Nature Biotechnol. 14: 745-750).

Typically, plant transformation vectors include one or more cloned plantcoding sequence (genomic or cDNA) under the transcriptional control of5′ and 3′ regulatory sequences and a dominant selectable marker. Suchplant transformation vectors typically also contain a promoter (e.g., aregulatory region controlling inducible or constitutive,environmentally- or developmentally-regulated, or cell- ortissue-specific expression), a transcription initiation start site, anRNA processing signal (such as intron splice sites), a transcriptiontermination site, and/or a polyadenylation signal.

A potential utility for the transcription factor polynucleotidesdisclosed herein is the isolation of promoter elements from these genesthat can be used to program expression in plants of any genes. Eachtranscription factor gene disclosed herein is expressed in a uniquefashion, as determined by promoter elements located upstream of thestart of translation, and additionally within an intron of thetranscription factor gene or downstream of the termination codon of thegene. As is well known in the art, for a significant portion of genes,the promoter sequences are located entirely in the region directlyupstream of the start of translation. In such cases, typically thepromoter sequences are located within 2.0 kb of the start oftranslation, or within 1.5 kb of the start of translation, frequentlywithin 1.0 kb of the start of translation, and sometimes within 0.5 kbof the start of translation.

The promoter sequences can be isolated according to methods known to oneskilled in the art.

Examples of constitutive plant promoters that can be useful forexpressing the transcription factor sequence include: the cauliflowermosaic virus (CaMV) 35S promoter, which confers constitutive, high-levelexpression in most plant tissues (see, for example, Odell et al. (1985)Nature 313: 810-812); the nopaline synthase promoter (An et al. (1988)Plant Physiol. 88: 547-552); and the octopine synthase promoter (Frommet al. (1989) Plant Cell 1: 977-984).

A variety of plant gene promoters that regulate gene expression inresponse to environmental, hormonal, chemical, developmental signals,and in a tissue-active manner can be used for expression of atranscription factor sequence in plants. Choice of a promoter is basedlargely on the phenotype of interest and is determined by such factorsas tissue (e.g., seed, fruit, root, pollen, vascular tissue, flower,carpel, etc.), inducibility (e.g., in response to wounding, heat, cold,drought, light, pathogens, etc.), timing, developmental stage, and thelike. Numerous known promoters have been characterized and can favorablybe employed to promote expression of a polynucleotide of the inventionin a transgenic plant or cell of interest. For example, tissue specificpromoters include: seed-specific promoters (such as the napin, phaseolinor DC3 promoter described in U.S. Pat. No. 5,773,697), fruit-specificpromoters that are active during fruit ripening (such as the dru 1promoter (U.S. Pat. No. 5,783,393), or the 2A11 promoter (U.S. Pat. No.4,943,674) and the tomato polygalacturonase promoter (Bird et al. (1988)Plant Mol. Biol. 11: 651-662), root-specific promoters, such as thosedisclosed in U.S. Pat. Nos. 5,618,988, 5,837,848 and 5,905,186,pollen-active promoters such as PTA29, PTA26 and PTA13 (U.S. Pat. No.5,792,929), promoters active in vascular tissue (Ringli and Keller(1998) Plant Mol. Biol. 37: 977-988), flower-specific (Kaiser et al.(1995) Plant Mol. Biol. 28: 231-243), pollen (Baerson et al. (1994)Plant Mol. Biol. 26: 1947-1959), carpels (Ohl et al. (1990) Plant Cell2: 837-848), pollen and ovules (Baerson et al. (1993) Plant Mol. Biol.22: 255-267), auxin-inducible promoters (such as that described in vander Kop et al. (1999) Plant Mol. Biol. 39: 979-990 or Baumann et al.(1999) Plant Cell 11: 323-334), cytokinin-inducible promoter(Guevara-Garcia (1998) Plant Mol. Biol. 38: 743-753), promotersresponsive to gibberellin (Shi et al. (1998) Plant Mol. Biol. 38:1053-1060, Willmott et al. (1998) Plant Mol. Biol. 38: 817-825) and thelike. Additional promoters are those that elicit expression in responseto heat (Ainley et al. (1993) Plant Mol. Biol. 22: 13-23), light (e.g.,the pea rbcS-3A promoter, Kuhlemeier et al. (1989) Plant Cell 1:471-478, and the maize rbcS promoter, Schaffner and Sheen (1991) PlantCell 3: 997-1012); wounding (e.g., wunI, Siebertz et al. (1989) PlantCell 1: 961-968); pathogens (such as the PR-1 promoter described inBuchel et al. (1999) Plant Mol. Biol. 40: 387-396, and the PDF1.2promoter described in Manners et al. (1998) Plant Mol. Biol. 38:1071-1080), and chemicals such as methyl jasmonate or salicylic acid(Gatz (1997) Annu. Rev. Plant Physiol. Plant Mol. Biol. 48: 89-108). Inaddition, the timing of the expression can be controlled by usingpromoters such as those acting at senescence (Gan and Amasino (1995)Science 270: 1986-1988); or late seed development (Odell et al. (1994)Plant Physiol. 106: 447-458).

Plant expression vectors can also include RNA processing signals thatcan be positioned within, upstream or downstream of the coding sequence.In addition, the expression vectors can include additional regulatorysequences from the 3′-untranslated region of plant genes, e.g., a 3′terminator region to increase mRNA stability of the mRNA, such as thePI-II terminator region of potato or the octopine or nopaline synthase3′ terminator regions.

Additional Expression Elements. Specific initiation signals can aid inefficient translation of coding sequences. These signals can include,e.g., the ATG initiation codon and adjacent sequences. In cases where acoding sequence, its initiation codon and upstream sequences areinserted into the appropriate expression vector, no additionaltranslational control signals may be needed. However, in cases whereonly coding sequence (e.g., a mature protein coding sequence), or aportion thereof, is inserted, exogenous transcriptional control signalsincluding the ATG initiation codon can be separately provided. Theinitiation codon is provided in the correct reading frame to facilitatetranscription. Exogenous transcriptional elements and initiation codonscan be of various origins, both natural and synthetic. The efficiency ofexpression can be enhanced by the inclusion of enhancers appropriate tothe cell system in use.

Expression Hosts. The present invention also relates to host cells thatare transduced with vectors of the invention, and the production ofpolypeptides of the invention (including fragments thereof) byrecombinant techniques. Host cells are genetically engineered (i.e.,nucleic acids are introduced, e.g., transduced, transformed ortransfected) with the vectors of this invention, which may be, forexample, a cloning vector or an expression vector comprising therelevant nucleic acids herein. The vector is optionally a plasmid, aviral particle, a phage, a naked nucleic acid, etc. The engineered hostcells can be cultured in conventional nutrient media modified asappropriate for activating promoters, selecting transformants, oramplifying the relevant gene. The culture conditions, such astemperature, pH and the like, are those previously used with the hostcell selected for expression, and will be apparent to those skilled inthe art and in the references cited herein, including, Sambrook et al.(1989) supra and Ausubel (2000) supra.

The host cell can be a eukaryotic cell, such as a yeast cell, or a plantcell, or the host cell can be a prokaryotic cell, such as a bacterialcell. Plant protoplasts are also suitable for some applications. Forexample, the DNA fragments are introduced into plant tissues, culturedplant cells or plant protoplasts by standard methods includingelectroporation (Fromm et al. (1985) Proc. Natl. Acad. Sci. USA 82:5824-5828, infection by viral vectors such as cauliflower mosaic virus(Hohn et al. (1982) Molecular Biology of Plant Tumors, Academic Press,New York, N.Y., pp. 549-560; U.S. Pat. No. 4,407,956), high velocityballistic penetration by small particles with the nucleic acid eitherwithin the matrix of small beads or particles, or on the surface (Kleinet al. (1987) Nature 327: 70-73), use of pollen as vector (WO 85/01856),or use of Agrobacterium tumefaciens or A. rhizogenes carrying a T-DNAplasmid in which DNA fragments are cloned. The T-DNA plasmid istransmitted to plant cells upon infection by Agrobacterium tumefaciens,and a portion is stably integrated into the plant genome (Horsch et al.(1984) Science 233: 496-498; Fraley et al. (1983) Proc. Natl. Acad. Sci.USA 80: 4803-4807).

The cell can include a nucleic acid of the invention that encodes apolypeptide, wherein the cell expresses a polypeptide of the invention.The cell can also include vector sequences, or the like. Furthermore,cells and transgenic plants that include any polypeptide or nucleic acidabove or throughout this specification, e.g., produced by transductionof a vector of the invention, are an additional feature of theinvention.

For long-term, high-yield production of recombinant proteins, stableexpression can be used. Host cells transformed with a nucleotidesequence encoding a polypeptide of the invention are optionally culturedunder conditions suitable for the expression and recovery of the encodedprotein from cell culture. The protein or fragment thereof produced by arecombinant cell may be secreted, membrane-bound, or containedintracellularly, depending on the sequence and/or the vector used. Aswill be understood by those of skill in the art, expression vectorscontaining polynucleotides encoding mature proteins of the invention canbe designed with signal sequences that direct secretion of the maturepolypeptides through a prokaryotic or eukaryotic cell membrane.

Modified Amino Acid Residues. Polypeptides of the invention may containone or more modified amino acid residues. The presence of modified aminoacids may be advantageous in, for example, increasing polypeptidehalf-life, reducing polypeptide antigenicity or toxicity, increasingpolypeptide storage stability, or the like. Amino acid residue(s) aremodified, for example, co-translationally or post-translationally duringrecombinant production or modified by synthetic or chemical means.

Non-limiting examples of a modified amino acid residue includeincorporation or other use of acetylated amino acids, glycosylated aminoacids, sulfated amino acids, prenylated (e.g., farnesylated,geranylgeranylated) amino acids, PEG modified (for example, “PEGylated”)amino acids, biotinylated amino acids, carboxylated amino acids,phosphorylated amino acids, etc. References adequate to guide one ofskill in the modification of amino acid residues are replete throughoutthe literature.

The modified amino acid residues may prevent or increase affinity of thepolypeptide for another molecule, including, but not limited to,polynucleotide, proteins, carbohydrates, lipids and lipid derivatives,and other organic or synthetic compounds.

Identification of Additional Factors. A transcription factor provided bythe present invention can also be used to identify additional endogenousor exogenous molecules that can affect a phentoype or trait of interest.On the one hand, such molecules include organic (small or largemolecules) and/or inorganic compounds that modulate expression of (i.e.,regulate) a particular transcription factor. Alternatively, suchmolecules include endogenous molecules that are acted upon either at atranscriptional level by a transcription factor of the invention tomodify a phenotype as desired. For example, the transcription factorscan be employed to identify one or more downstream genes that aresubject to a regulatory effect of the transcription factor. In oneapproach, a transcription factor or transcription factor homolog of theinvention is expressed in a host cell, e.g., a transgenic plant cell,tissue or explant, and expression products, either RNA or protein, oflikely or random targets are monitored, e.g., by hybridization to amicroarray of nucleic acid probes corresponding to genes expressed in atissue or cell type of interest, by two-dimensional gel electrophoresisof protein products, or by any other method known in the art forassessing expression of gene products at the level of RNA or protein.Alternatively, a transcription factor of the invention can be used toidentify promoter sequences (such as binding sites on DNA sequences)involved in the regulation of a downstream target. After identifying apromoter sequence, interactions between the transcription factor and thepromoter sequence can be modified by changing specific nucleotides inthe promoter sequence or specific amino acids in the transcriptionfactor that interact with the promoter sequence to alter a plant trait.Typically, transcription factor DNA-binding sites are identified by gelshift assays. After identifying the promoter regions, the promoterregion sequences can be employed in double-stranded DNA arrays toidentify molecules that affect the interactions of the transcriptionfactors with their promoters (Bulyk et al. (1999) Nature Biotechnol. 17:573-577).

The identified transcription factors are also useful to identifyproteins that modify the activity of the transcription factor. Suchmodification can occur by covalent modification, such as byphosphorylation, or by protein-protein (homo or -heteropolymer)interactions. Any method suitable for detecting protein-proteininteractions can be employed. Among the methods that can be employed areco-immunoprecipitation, cross-linking and co-purification throughgradients or chromatographic columns, and the two-hybrid yeast system.

The two-hybrid system detects protein interactions in vivo and isdescribed in Chien et al. (1991) Proc. Natl. Acad. Sci. USA 88:9578-9582, and is commercially available from Clontech (Palo Alto,Calif.). In such a system, plasmids are constructed that encode twohybrid proteins: one consists of the DNA-binding domain of atranscription activator protein fused to the transcription factorpolypeptide and the other consists of the transcription activatorprotein's activation domain fused to an unknown protein that is encodedby a cDNA that has been recombined into the plasmid as part of a cDNAlibrary. The DNA-binding domain fusion plasmid and the cDNA library aretransformed into a strain of the yeast Saccharomyces cerevisiae thatcontains a reporter gene (e.g., lacZ) whose regulatory region containsthe transcription activator's binding site. Either hybrid protein alonecannot activate transcription of the reporter gene. Interaction of thetwo hybrid proteins reconstitutes the functional activator protein andresults in expression of the reporter gene, which is detected by anassay for the reporter gene product. Then, the library plasmidsresponsible for reporter gene expression are isolated and sequenced toidentify the proteins encoded by the library plasmids. After identifyingproteins that interact with the transcription factors, assays forcompounds that interfere with the transcription factor protein-proteininteractions can be performed.

Subsequences. Also contemplated are uses of polynucleotides, alsoreferred to herein as oligonucleotides, typically having at least 12bases, preferably at least 15, more preferably at least 20, 30, or 50bases, which hybridize under stringent conditions to a polynucleotidesequence described above. The polynucleotides may be used as probes,primers, sense and antisense agents, and the like, according to methodsas noted supra.

Subsequences of the polynucleotides of the invention, includingpolynucleotide fragments and oligonucleotides are useful as nucleic acidprobes and primers. An oligonucleotide suitable for use as a probe orprimer is at least about 15 nucleotides in length, more often at leastabout 18 nucleotides, often at least about 21 nucleotides, frequently atleast about 30 nucleotides, or about 40 nucleotides, or more in length.A nucleic acid probe is useful in hybridization protocols, for example,to identify additional polypeptide homologs of the invention, includingprotocols for microarray experiments. Primers can be annealed to acomplementary target DNA strand by nucleic acid hybridization to form ahybrid between the primer and the target DNA strand, and then extendedalong the target DNA strand by a DNA polymerase enzyme. Primer pairs canbe used for amplification of a nucleic acid sequence, e.g., by thepolymerase chain reaction (PCR) or other nucleic-acid amplificationmethods. See Sambrook et al. (1989) supra, and Ausubel (2000) supra.

In addition, the invention includes an isolated or recombinantpolypeptide including a subsequence of at least about 15 contiguousamino acids encoded by the recombinant or isolated polynucleotides ofthe invention. For example, such polypeptides, or domains or fragmentsthereof, can be used as immunogens, e.g., to produce antibodies specificfor the polypeptide sequence, or as probes for detecting a sequence ofinterest. A subsequence can range in size from about 15 amino acids inlength up to and including the fall length of the polypeptide.

To be encompassed by the present invention, an expressed polypeptidethat comprises such a polypeptide subsequence performs at least onebiological function of the intact polypeptide in substantially the samemanner, or to a similar extent, as does the intact polypeptide. Forexample, a polypeptide fragment can comprise a recognizable structuralmotif or functional domain such as a DNA binding domain that activatestranscription, for example, by binding to a specific DNA promoter regionan activation domain, or a domain for protein-protein interactions.

Traits That May Be Modified in Overexpressing or Knock-out Plants.Presently disclosed transcription factor genes, including G28, G3430 andtheir equivalogs, have been shown to or are likely to affect a plant'sresponse to various plant diseases, pathogens and pests, and mayincrease the tolerance or resistance of a plant to more than onepathogen. The pathogenic organisms include, for example, fungalpathogens Fusarium oxysporum, Botrytis cinerea, Sclerotiniasclerotiorum, and Erysiphe orontii. Bacterial pathogens to whichresistance may be conferred include Pseudomonas syringae. Other problemorganisms may potentially include nematodes, mollicutes, parasites, orherbivorous arthropods. In each case, overexpression of one or more ofthe transcription factor sequences of the invention may provide benefitto the plant to help prevent or overcome infestation, or be used tomanipulate any of the various plant responses to disease. Thesemechanisms by which the transcription factors work could includeincreasing surface waxes or oils, surface thickness, or the activationof signal transduction pathways that regulate plant defense in responseto attacks by herbivorous pests (including, for example, proteaseinhibitors). Another means to combat fungal and other pathogens is byaccelerating local cell death or senescence, mechanisms used to impairthe spread of pathogenic microorganisms throughout a plant. Forinstance, the best known example of accelerated cell death is theresistance gene-mediated hypersensitive response, which causes localizedcell death at an infection site and initiates a systemic defenseresponse. Because many defenses, signaling molecules, and signaltransduction pathways are common to defense against different pathogensand pests, such as fungal, bacterial, oomycete, nematode, and insect,transcription factors that are implicated in defense responses againstthe fungal pathogens tested may also function in defense against otherpathogens and pests. For example, the transcription factor from tobacco,Tsi1 (Shin et al. (2002) Mol. Plant-Microbe Interactions 15: 939-989)provides improved resistance in pepper plants to a fungal pathogen(Phtyophthora capsici), a bacterial pathogen (Xanthomonas campestris)and a viral pathogen (cucumber mosaic virus).

Production of Transgenic Plants

Modification of Traits. The polynucleotides of the invention arefavorably employed to produce transgenic plants with various traits, orcharacteristics, that have been modified in a desirable manner, e.g., toimprove the seed characteristics of a plant. For example, alteration ofexpression levels or patterns (e.g., spatial or temporal expressionpatterns) of one or more of the transcription factors (or transcriptionfactor homologs) of the invention, as compared with the levels of thesame protein found in a wild-type plant, can be used to modify a plant'straits. An illustrative example of trait modification, improvedcharacteristics, by altering expression levels of a particulartranscription factor is described further in the Examples and theSequence Listing.

Arabidopsis as a model system. Arabidopsis thaliana is the object ofrapidly growing attention as a model for genetics and metabolism inplants. Arabidopsis has a small genome, and well-documented studies areavailable. It is easy to grow in large numbers and mutants definingimportant genetically controlled mechanisms are either available, or canreadily be obtained. Various methods to introduce and express isolatedhomologous genes are available (see Koncz et al., editors, Methods inArabidopsis Research (1992) World Scientific, New Jersey, in “Preface”).Because of its small size, short life cycle, obligate autogamy and highfertility, Arabidopsis is also a choice organism for the isolation ofmutants and studies in morphogenetic and development pathways, andcontrol of these pathways by transcription factors (Koncz (1992) supra,p. 72). A number of studies introducing transcription factors into A.thaliana have demonstrated the utility of this plant for understandingthe mechanisms of gene regulation and trait alteration in plants. (See,for example, Koncz supra, and U.S. Pat. No. 6,417,428).

Arabidopsis genes in transgenic plants. Expression of genes that encodetranscription factors modify expression of endogenous genes,polynucleotides, and proteins are well known in the art. In addition,transgenic plants comprising isolated polynucleotides encodingtranscription factors may also modify expression of endogenous genes,polynucleotides, and proteins. Examples include Peng et al. (1997) etal. Genes and Development 11: 3194-3205, and Peng et al. (1999) Nature400: 256-261. In addition, many others have demonstrated that anArabidopsis transcription factor expressed in an exogenous plant specieselicits the same or very similar phenotypic response. See, for example,Fu et al. (2001) Plant Cell 13: 1791-1802; Nandi et al. (2000) Curr.Biol. 10: 215-218; Coupland (1995) Nature 377: 482-483; and Weigel andNilsson (1995) Nature 377: 482-500.

Homologous genes introduced into transgenic plants. Homologous genesthat may be derived from any plant, or from any source whether natural,synthetic, semi-synthetic or recombinant, and that share significantsequence identity or similarity to those provided by the presentinvention, may be introduced into plants, for example, crop plants, toconfer desirable or improved traits. Consequently, transgenic plants maybe produced that comprise a recombinant expression vector or cassettewith a promoter operably linked to one or more sequences homologous topresently disclosed sequences. The promoter may be, for example, a plantor viral promoter.

The invention thus provides for methods for preparing transgenic plants,and for modifying plant traits. These methods include introducing into aplant a recombinant expression vector or cassette comprising afunctional promoter operably linked to one or more sequences homologousto presently disclosed sequences. Plants and kits for producing theseplants that result from the application of these methods are alsoencompassed by the present invention.

Transcription factors of interest for the modification of plant traits.Currently, the existence of a series of maturity groups for differentlatitudes represents a major barrier to the introduction of new valuabletraits. Any trait (e.g. disease resistance) has to be bred into each ofthe different maturity groups separately, a laborious and costlyexercise. The availability of a single strain that could be grown at anylatitude would therefore greatly increase the potential for introducingnew traits to crop species such as soybean and cotton.

More than one transcription factor gene may be introduced into a plant,either by transforming the plant with one or more vectors comprising twoor more transcription factors, or by selective breeding of plants toyield hybrid crosses that comprise more than one introducedtranscription factor.

Many of the transcription factors listed in the Sequence Listing may beoperably linked with a specific promoter that causes the transcriptionfactor to be expressed in response to environmental, tissue-specific ortemporal signals. For examples of flower specific promoters, see Kaiseret al. (supra). For examples of other tissue-specific, temporal-specificor inducible promoters, see the above discussion under the heading“Vectors, Promoters, and Expression Systems”.

Antisense and co-suppression. In addition to expression of the nucleicacids of the invention as gene replacement or plant phenotypemodification nucleic acids, the nucleic acids are also useful for senseand anti-sense suppression of expression, e.g., to down-regulateexpression of a nucleic acid of the invention, e.g., as a furthermechanism for modulating plant phenotype. That is, the nucleic acids ofthe invention, or subsequences or anti-sense sequences thereof, can beused to block expression of naturally occurring homologous nucleicacids. A variety of sense and anti-sense technologies are known in theart, e.g., as set forth in Lichtenstein and Nellen (1997) AntisenseTechnology: A Practical Approach IRL Press at Oxford University Press,Oxford, U.K. Antisense regulation is also described in Crowley et al.(1985) Cell 43: 633-641; Rosenberg et al. (1985) Nature 313: 703-706;Preiss et al. (1985) Nature 313: 27-32; Melton (1985) Proc. Natl. Acad.Sci. USA 82: 144-148; Izant and Weintraub (1985) Science 229: 345-352;and Kim and Wold (1985) Cell 42: 129-138. Additional methods forantisense regulation are known in the art. Antisense regulation has beenused to reduce or inhibit expression of plant genes in, for example inEuropean Patent Publication No. 271988. Antisense RNA may be used toreduce gene expression to produce a visible or biochemical phenotypicchange in a plant (Smith et al. (1988) Nature 334: 724-726; Smith et al.(1990) Plant Mol. Biol. 14: 369-379). In general, sense or anti-sensesequences are introduced into a cell, where they are optionallyamplified, for example, by transcription. Such sequences include bothsimple oligonucleotide sequences and catalytic sequences such asribozymes.

For example, a reduction or elimination of expression (i.e., a“knock-out”) of a transcription factor or transcription factor homologpolypeptide in a transgenic plant, e.g., to modify a plant trait, can beobtained by introducing an antisense construct corresponding to thepolypeptide of interest as a cDNA. For antisense suppression, thetranscription factor or homolog cDNA is arranged in reverse orientation(with respect to the coding sequence) relative to the promoter sequencein the expression vector. The introduced sequence need not be the fulllength cDNA or gene, and need not be identical to the cDNA or gene foundin the plant type to be transformed. Typically, the antisense sequenceneed only be capable of hybridizing to the target gene or RNA ofinterest. Thus, where the introduced sequence is of shorter length, ahigher degree of homology to the endogenous transcription factorsequence will be needed for effective antisense suppression. Whileantisense sequences of various lengths can be utilized, preferably, theintroduced antisense sequence in the vector will be at least 30nucleotides in length, and improved antisense suppression will typicallybe observed as the length of the antisense sequence increases.Preferably, the length of the antisense sequence in the vector will begreater than 100 nucleotides. Transcription of an antisense construct asdescribed results in the production of RNA molecules that are thereverse complement of mRNA molecules transcribed from the endogenoustranscription factor gene in the plant cell.

Suppression of endogenous transcription factor gene expression can alsobe achieved using a ribozyme. Ribozymes are RNA molecules that possesshighly specific endoribonuclease activity. The production and use ofribozymes are disclosed in U.S. Pat. No. 4,987,071 and U.S. Pat. No.5,543,508. Synthetic ribozyme sequences including antisense RNAs can beused to confer RNA cleaving activity on the antisense RNA, such thatendogenous mRNA molecules that hybridize to the antisense RNA arecleaved, which in turn leads to an enhanced antisense inhibition ofendogenous gene expression.

Vectors in which RNA encoded by a transcription factor or transcriptionfactor homolog cDNA is over-expressed can also be used to obtainco-suppression of a corresponding endogenous gene, for example, in themanner described in U.S. Pat. No. 5,231,020 to Jorgensen. Suchco-suppression (also termed sense suppression) does not require that theentire transcription factor cDNA be introduced into the plant cells, nordoes it require that the introduced sequence be exactly identical to theendogenous transcription factor gene of interest. However, as withantisense suppression, the suppressive efficiency will be enhanced asspecificity of hybridization is increased, e.g., as the introducedsequence is lengthened, and/or as the sequence similarity between theintroduced sequence and the endogenous transcription factor gene isincreased.

Vectors expressing an untranslatable form of the transcription factormRNA (e.g., sequences comprising one or more stop codons or nonsensemutations) can also be used to suppress expression of an endogenoustranscription factor, thereby reducing or eliminating its activity andmodifying one or more traits. Methods for producing such constructs aredescribed in U.S. Pat. No. 5,583,021. Preferably, such constructs aremade by introducing a premature stop codon into the transcription factorgene. Alternatively, a plant trait can be modified by gene silencingusing double-strand RNA (Sharp (1999) Genes and Development 13:139-141). Another method for abolishing the expression of a gene is byinsertion mutagenesis using the T-DNA of Agrobacterium tumefaciens.After generating the insertion mutants, the mutants can be screened toidentify those containing the insertion in a transcription factor ortranscription factor homolog gene. Plants containing a single transgeneinsertion event at the desired gene can be crossed to generatehomozygous plants for the mutation. Such methods are well known to thoseof skill in the art (See for example Koncz et al. (1992) Methods inArabidopsis Research, World Scientific Publishing Co. Pte. Ltd., RiverEdge N.J.).

Suppression of endogenous transcription factor gene expression can alsobe achieved using RNA interference (RNAi). RNAi is apost-transcriptional, targeted gene-silencing technique that usesdouble-stranded RNA (dsRNA) to incite degradation of mRNA containing thesame sequence as the dsRNA (Constans, (2002) The Scientist 16:36). Smallinterfering RNAs, or siRNAs are produced in at least two steps: anendogenous ribonuclease cleaves longer dsRNA into shorter, 21-23nucleotide-long RNAs. The siRNA segments then mediate the degradation ofthe target mRNA (Zamore, (2001) Nature Struct. Biol., 8:746-50). RNAihas been used for gene function determination in a manner similar toantisense oligonucleotides (Constans, (2002) The Scientist 16:36).Expression vectors that continually express siRNAs in transiently andstably transfected have been engineered to express small hairpin RNAs(shRNAs), which get processed in vivo into siRNAs-like molecules capableof carrying out gene-specific silencing (Brummelkamp et al., (2002)Science 296:550-553, and Paddison, et al. (2002) Genes & Dev.16:948-958). Post-transcriptional gene silencing by double-stranded RNAis discussed in further detail by Hammond et al. (2001) Nature Rev Gen2: 110-119, Fire et al. (1998) Nature 391: 806-811 and Timmons and Fire(1998) Nature 395: 854.

Alternatively, a plant phenotype can be altered by eliminating anendogenous gene, such as a transcription factor or transcription factorhomolog, e.g., by homologous recombination (Kempin et al. (1997) Nature389: 802-803).

A plant trait can also be modified by using the Cre-lox system (forexample, as described in U.S. Pat. No. 5,658,772). A plant genome can bemodified to include first and second lox sites that are then contactedwith a Cre recombinase. If the lox sites are in the same orientation,the intervening DNA sequence between the two sites is excised. If thelox sites are in the opposite orientation, the intervening sequence isinverted.

The polynucleotides and polypeptides of this invention can also beexpressed in a plant in the absence of an expression cassette bymanipulating the activity or expression level of the endogenous gene byother means, such as, for example, by ectopically expressing a gene byT-DNA activation tagging (Ichikawa et al. (1997) Nature 390 698-701;Kakimoto et al. (1996) Science 274: 982-985). This method entailstransforming a plant with a gene tag containing multiple transcriptionalenhancers and once the tag has inserted into the genome, expression of aflanking gene coding sequence becomes deregulated. In another example,the transcriptional machinery in a plant can be modified so as toincrease transcription levels of a polynucleotide of the invention (see,for example, PCT Publications WO 96/06166 and WO 98/53057 that describethe modification of the DNA-binding specificity of zinc fingertranscription factor proteins by changing particular amino acids in theDNA-binding motif).

The transgenic plant can also include the machinery necessary forexpressing or altering the activity of a polypeptide encoded by anendogenous gene, for example, by altering the phosphorylation state ofthe polypeptide to maintain it in an activated state.

Transgenic plants (or plant cells, or plant explants, or plant tissues)incorporating the polynucleotides of the invention and/or expressing thepolypeptides of the invention can be produced by a variety of wellestablished techniques as described above. Following construction of avector, most typically an expression cassette, including apolynucleotide, e.g., encoding a transcription factor or transcriptionfactor homolog, of the invention, standard techniques can be used tointroduce the polynucleotide into a plant, a plant cell, a plant explantor a plant tissue of interest. Optionally, the plant cell, explant ortissue can be regenerated to produce a transgenic plant.

The plant can be any higher plant, including gymnosperms,monocotyledonous and dicotyledenous plants. Suitable protocols areavailable for Leguminosae (alfalfa, soybean, clover, etc.), Umbelliferae(carrot, celery, parsnip), Cruciferae (cabbage, radish, rapeseed,broccoli, etc.), Curcurbitaceae (melons and cucumber), Gramineae (wheat,corn, rice, barley, millet, etc.), Solanaceae (potato, tomato, tobacco,peppers, etc.), and various other crops. See protocols described inAmmirato et al., Editors, (1984) Handbook of Plant Cell Culture—CropSpecies, Macmillan Publ. Co., New York N.Y.; Shimamoto et al. (1989)Nature 338: 274-276; Fromm et al. (1990) Bio/Technol. 8: 833-839; andVasil et al. (1990) Bio/Technol. 8: 429-434.

Transformation and regeneration of both monocotyledonous anddicotyledonous plant cells is now routine, and the selection of the mostappropriate transformation technique will be determined by thepractitioner. The choice of method will vary with the type of plant tobe transformed; those skilled in the art will recognize the suitabilityof particular methods for given plant types. Suitable methods caninclude, but are not limited to: electroporation of plant protoplasts;liposome-mediated transformation; polyethylene glycol (PEG) mediatedtransformation; transformation using viruses; micro-injection of plantcells; micro-projectile bombardment of plant cells; vacuum infiltration;and Agrobacterium tumefaciens-mediated transformation. Transformationmeans introducing a nucleotide sequence into a plant in a manner tocause stable or transient expression of the sequence.

Successful examples of the modification of plant characteristics bytransformation with cloned sequences that serve to illustrate thecurrent knowledge in this field of technology, and which are hereinincorporated by reference, include: U.S. Pat. Nos. 5,571,706; 5,677,175;5,510,471; 5,750,386; 5,597,945; 5,589,615; 5,750,871; 5,268,526;5,780,708; 5,538,880; 5,773,269; 5,736,369 and 5,610,042.

Following transformation, plants are preferably selected using adominant selectable marker incorporated into the transformation vector.Typically, such a marker will confer antibiotic or herbicide resistanceon the transformed plants, and selection of transformants can beaccomplished by exposing the plants to appropriate concentrations of theantibiotic or herbicide.

After transformed plants are selected and grown to maturity, thoseplants showing a modified trait are identified. The modified trait canbe any of those traits described above. Additionally, to confirm thatthe modified trait is due to changes in expression levels or activity ofthe polypeptide or polynucleotide of the invention can be determined byanalyzing mRNA expression using Northern blots, RT-PCR or microarrays,or protein expression using immunoblots or Western blots or gel shiftassays.

Integrated Systems—Sequence Identity. Additionally, the presentinvention may be an integrated system, computer or computer readablemedium that comprises an instruction set for determining the identity ofone or more sequences in a database. The instruction set can also beused to generate or identify sequences that meet any specified criteria.Furthermore, the instruction set may be used to associate or linkcertain functional benefits, such improved characteristics, with one ormore identified sequence.

For example, the instruction set can include, e.g., a sequencecomparison or other alignment program, e.g., an available program suchas, for example, the Wisconsin Package Version 10.0, such as BLAST,FASTA, PILEUP, FINDPATTERNS, or the like (GCG, Madison, Wis.). Publicsequence databases such as GenBank, EMBL, Swiss-Prot and PIR, or privatesequence databases such as PHYTOSEQ sequence database (Incyte Genomics,Wilmington, Del.) can be searched.

Alignment of sequences for comparison can be conducted by the localhomology algorithm of Smith and Waterman (1981) Adv. Appl. Math. 2:482-489, by the homology alignment algorithm of Needleman and Wunsch(1970) J. Mol. Biol. 48: 443-453, by the search for similarity method ofPearson and Lipman (1988) Proc. Natl. Acad. Sci. USA 85: 2444-2448, bycomputerized implementations of these algorithms. After alignment,sequence comparisons between two (or more) polynucleotides orpolypeptides are typically performed by comparing sequences of the twosequences over a comparison window to identify and compare local regionsof sequence similarity. The comparison window can be a segment of atleast about 20 contiguous positions, usually about 50 to about 200, moreusually about 100 to about 150 contiguous positions. A description ofthe method is provided in Ausubel (2000) supra.

A variety of methods for determining sequence relationships can be used,including manual alignment and computer assisted sequence alignment andanalysis. This later approach is a preferred approach in the presentinvention, due to the increased throughput afforded by computer assistedmethods. As noted above, a variety of computer programs for performingsequence alignment are available, or can be produced by one of skill inthe art.

One example algorithm that is suitable for determining percent sequenceidentity and sequence similarity is the BLAST algorithm, which isdescribed in Altschul et al. (1990) supra. Software for performing BLASTanalyses is publicly available, e.g., through the National Library ofMedicine's National Center for Biotechnology Information (ncbi.nln.nih;see at world wide web (www) National Institutes of Health US government(gov) website). This algorithm involves first identifying high scoringsequence pairs (HSPs) by identifying short words of length W in thequery sequence, which either match or satisfy some positive-valuedthreshold score T when aligned with a word of the same length in adatabase sequence. T is referred to as the neighborhood word scorethreshold (Altschul et al. (1990) J. Mol. Biol. 215: 403-410). Theseinitial neighborhood word hits act as seeds for initiating searches tofind longer HSPs containing them. The word hits are then extended inboth directions along each sequence for as far as the cumulativealignment score can be increased. Cumulative scores are calculatedusing, for nucleotide sequences, the parameters M (reward score for apair of matching residues; always >0) and N (penalty score formismatching residues; always <0). For amino acid sequences, a scoringmatrix is used to calculate the cumulative score. Extension of the wordhits in each direction are halted when: the cumulative alignment scorefalls off by the quantity X from its maximum achieved value; thecumulative score goes to zero or below, due to the accumulation of oneor more negative-scoring residue alignments; or the end of eithersequence is reached. The BLAST algorithm parameters W, T, and Xdetermine the sensitivity and speed of the alignment. The BLASTN program(for nucleotide sequences) uses as defaults a wordlength (W) of 11, anexpectation (E) of 10, a cutoff of 100, M=5, N=−4, and a comparison ofboth strands. For amino acid sequences, the BLASTP program uses asdefaults a wordlength (W) of 3, an expectation (E) of 10, and theBLOSUM62 scoring matrix (see Henikoff and Henikoff (1992) Proc. Natl.Acad. Sci. USA 89: 10915-10919). Unless otherwise indicated, “sequenceidentity” refers to the percent sequence identity generated from atblastx analysis using the NCBI version of the algorithm at the defaultsettings using gapped alignments with the filter “off” (NIH NLM NCBIwebsite at ncbi.nlm.nih, supra).

In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences (see, for example, Karlin and Altschul (1993) Proc. Natl.Acad. Sci. USA 90: 5873-5787). One measure of similarity provided by theBLAST algorithm is the smallest sum probability (P(N)), which providesan indication of the probability by which a match between two nucleotideor amino acid sequences would occur by chance. For example, a nucleicacid is considered similar to a reference sequence (and, therefore, inthis context, homologous) if the smallest sum probability in acomparison of the test nucleic acid to the reference nucleic acid isless than about 0.1, or less than about 0.01, and or even less thanabout 0.001. An additional example of a useful sequence alignmentalgorithm is PILEUP. PILEUP creates a multiple sequence alignment from agroup of related sequences using progressive, pairwise alignments. Theprogram can align, for example, up to 300 sequences of a maximum lengthof 5,000 letters.

The integrated system, or computer typically includes a user inputinterface allowing a user to selectively view one or more sequencerecords corresponding to the one or more character strings, as well asan instruction set that aligns the one or more character strings witheach other or with an additional character string to identify one ormore region of sequence similarity. The system may include a link of oneor more character strings with a particular phenotype or gene function.Typically, the system includes a user readable output element thatdisplays an alignment produced by the alignment instruction set.

The methods of this invention can be implemented in a localized ordistributed computing environment. In a distributed environment, themethods may implemented on a single computer comprising multipleprocessors or on a multiplicity of computers. The computers can belinked, e.g. through a common bus, but more preferably the computer(s)are nodes on a network. The network can be a generalized or a dedicatedlocal or wide-area network and, in certain preferred embodiments, thecomputers may be components of an intra-net or an internet.

Thus, the invention provides methods for identifying a sequence similaror homologous to one or more polynucleotides as noted herein, or one ormore target polypeptides encoded by the polynucleotides, or otherwisenoted herein and may include linking or associating a given plantphenotype or gene function with a sequence. In the methods, a sequencedatabase is provided (locally or across an inter or intra net) and aquery is made against the sequence database using the relevant sequencesherein and associated plant phenotypes or gene functions.

Any sequence herein can be entered into the database, before or afterquerying the database. This provides for both expansion of the databaseand, if done before the querying step, for insertion of controlsequences into the database. The control sequences can be detected bythe query to ensure the general integrity of both the database and thequery. As noted, the query can be performed using a web browser basedinterface. For example, the database can be a centralized publicdatabase such as those noted herein, and the querying can be done from aremote terminal or computer across an internet or intranet.

Any sequence herein can be used to identify a similar, homologous,paralogous, or orthologous sequence in another plant. This providesmeans for identifying endogenous sequences in other plants that may beuseful to alter a trait of progeny plants, which results from crossingtwo plants of different strain. For example, sequences that encode anortholog of any of the sequences herein that naturally occur in a plantwith a desired trait can be identified using the sequences disclosedherein. The plant is then crossed with a second plant of the samespecies but which does not have the desired trait to produce progenythat can then be used in further crossing experiments to produce thedesired trait in the second plant. Therefore the resulting progeny plantcontains no transgenes; expression of the endogenous sequence may alsobe regulated by treatment with a particular chemical or other means,such as EMR. Some examples of such compounds well known in the artinclude: ethylene; cytokinins; phenolic compounds, which stimulate thetranscription of the genes needed for infection; specificmonosaccharides and acidic environments that potentiate vir geneinduction; acidic polysaccharides that induce one or more chromosomalgenes; and opines; other mechanisms include light or dark treatment (fora review of examples of such treatments, see, Winans (1992) Microbiol.Rev. 56: 12-31; Eyal et al. (1992) Plant Mol. Biol. 19: 589-599;Chrispeels et al. (2000) Plant Mol. Biol. 42: 279-290; Piazza et al.(2002) Plant Physiol. 128: 1077-1086).

Table 6 lists a summary of homologous sequences identified using BLAST(tblastx program). The first column shows the orthologous or homologouspolynucleotide GenBank Accession Number (Test Sequence ID), the secondcolumn shows the calculated probability value that the sequence identityis due to chance (Smallest Sum Probability), the third column shows theplant species from which the test sequence was isolated (Test SequenceSpecies), and the fourth column shows the orthologous or homologous testsequence GenBank annotation (Test Sequence GenBank Annotation). TABLE 6Sequences orthologous to G28 identified using BLAST Smallest Sum TestSequence GenBank Test Sequence ID Probability Test Sequence SpeciesAnnotation AF245119 2.00E−72 Mesembryanthemum crystallinum AP2-relatedtranscription fac BQ165291 1.00E−68 Medicago truncatula EST611160 KVKCMedicago truncatula cDNA AB016264 1.00E−57 Nicotiana sylvestris nserf2gene for ethylene- responsive el TOBBY4D 2.00E−57 Nicotiana tabacumTobacco mRNA for EREBP-2, complete cds. BQ047502 2.00E−57 Solanumtuberosum EST596620 P. infestans- challenged potato LEU89255 2.00E−56Lycopersicon esculentum DNA-binding protein Pti4 mRNA, comp BH4542772.00E−54 Brassica oleracea BOGSI45TR BOGS Brassica oleracea genomicBE449392 1.00E−53 Lycopersicon hirsutum EST356151 L. hirsutum trichome,Corne AB035270 2.00E−50 Matricaria chamomilla McEREBP1 mRNA forethylene-responsive AW233956 5.00E−50 Glycine max sf32e02.y1 Gm-c1028Glycine max cDNA clone GENO gi7528276 6.10E−71 Mesembryanthemumcrystallinum AP2-related transcription f gi8809571 3.30E−56 Nicotianasylvestris ethylene-responsive element binding gi3342211 4.20E−56Lycopersicon esculentum Pti4. gi1208498 8.70E−56 Nicotiana tabacumEREBP-2. gi14140141 4.20E−49 Oryza sativa putative AP2-relatedtranscription factor. gi17385636 3.00E−46 Matricaria chamomillaethylene-responsive element binding gi21304712 2.90E−31 Glycine maxethylene-responsive element binding protein 1 gi15623863 5.60E−29 Oryzasativa (japonica cultivar- contains EST˜hypot group) gi8980313 1.20E−26Catharanthus roseus AP2-domain DNA-binding protein. gi4099921 3.10E−21Stylosanthes hamata EREBP-3 homolog.Molecular Modeling

Another means that may be used to confirm the utility and function oftranscription factor sequences that are orthologous or paralogous topresently disclosed transcription factors is through the use ofmolecular modeling software. Molecular modeling is routinely used topredict polypeptide structure, and a variety of protein structuremodeling programs, such as “Insight II” (Accelrys, Inc.) arecommercially available for this purpose. Modeling can thus be used topredict which residues of a polypeptide can be changed without alteringfunction (Crameri et al. (2003) U.S. Pat. No. 6,521,453). Thus,polypeptides that are sequentially similar can be shown to have a highlikelihood of similar function by their structural similarity, whichmay, for example, be established by comparison of regions ofsuperstructure. The relative tendencies of amino acids to form regionsof superstructure (for example, α-helixes and β-sheets) are wellestablished. For example, O'Neil et al. ((1990) Science 250: 646-651)have discussed in detail the helix forming tendencies of amino acids.Tables of relative structure forming activity for amino acids can beused as substitution tables to predict which residues can befunctionally substituted in a given region, for example, in DNA-bindingdomains of known transcription factors and equivalogs. Homologs that arelikely to be functionally similar can then be identified.

Of particular interest is the structure of a transcription factor in theregion of its conserved domains, such as those identified in FIGS. 3A-3B(Motif Y) and FIGS. 3D-3E (AP2 domains). Structural analyses may beperformed by comparing the structure of the known transcription factoraround its conserved domain with those of orthologs and paralogs.Analysis of a number of polypeptides within a transcription factor groupor clade, including the functionally or sequentially similarpolypeptides provided in the Sequence Listing, may also provide anunderstanding of structural elements required to regulate transcriptionwithin a given family.

EXAMPLES

It is to be understood that this invention is not limited to theparticular materials and methods described. Although particularembodiments are described, equivalent embodiments may be used topractice the invention. The described embodiments are not intended tolimit the scope of the invention, which is limited only by the appendedclaims. The examples below are provided to enable the subject inventionand are not included for the purpose of limiting the invention.

The invention, now being generally described, will be more readilyunderstood by reference to the following examples, which are includedmerely for purposes of illustration of certain aspects and embodimentsof the present invention and are not intended to limit the invention. Itwill be recognized by one of skill in the art that a transcriptionfactor associated with a particular first trait may be associated withat least one other, unrelated and inherent second trait that was notpredicted by the first trait.

Example I Full Length Gene Identification and Cloning

Putative transcription factor sequences (genomic or ESTs) related toknown transcription factors were identified in the Arabidopsis thalianaGenBank database using the tblastn sequence analysis program usingdefault parameters and a P-value cutoff threshold of −4 or −5 or lower,depending on the length of the query sequence. Putative transcriptionfactor sequence hits were then screened to identify those containingparticular sequence strings. If the sequence hits contained suchsequence strings, the sequences were confirmed as transcription factors.

Alternatively, Arabidopsis thaliana cDNA libraries derived fromdifferent tissues or treatments, or genomic libraries were screened toidentify novel members of a transcription family using a low stringencyhybridization approach. Probes were synthesized using gene specificprimers in a standard PCR reaction (annealing temperature 60° C.) andlabeled with ³²P dCTP using the High Prime DNA Labeling Kit (BoehringerMannheim Corp. (now Roche Diagnostics Corp., Indianapolis, Ind.).Purified radiolabelled probes were added to filters immersed in Churchhybridization medium (0.5 M NaPO₄ pH 7.0, 7% SDS, 1% w/v bovine serumalbumin) and hybridized overnight at 60° C. with shaking. Filters werewashed two times for 45 to 60 minutes with 1×SCC, 1% SDS at 60° C.

To identify additional sequence 5′ or 3′ of a partial cDNA sequence in acDNA library, 5′ and 3′ rapid amplification of cDNA ends (RACE) wasperformed using the MARATHON cDNA amplification kit (Clontech, PaloAlto, Calif.). Generally, the method entailed first isolating poly(A)mRNA, performing first and second strand cDNA synthesis to generatedouble stranded cDNA, blunting cDNA ends, followed by ligation of theMARATHON Adaptor to the cDNA to form a library of adaptor-ligated dscDNA.

Gene-specific primers were designed to be used along with adaptorspecific primers for both 5′ and 3′ RACE reactions. Nested primers,rather than single primers, were used to increase PCR specificity. Using5′ and 3′ RACE reactions, 5′ and 3′ RACE fragments were obtained,sequenced and cloned. The process can be repeated until 5′ and 3′ endsof the full-length gene were identified. Then the full-length cDNA wasgenerated by PCR using primers specific to 5′ and 3′ ends of the gene byend-to-end PCR.

Example II Construction of Expression Vectors

The sequence was amplified from a genomic or cDNA library using primersspecific to sequences upstream and downstream of the coding region. Theexpression vector was pMEN20 or pMEN65, which are both derived frompMON316 (Sanders et al. (1987) Nucleic Acids Res. 15:1543-1558) andcontain the CaMV 35S promoter to express transgenes. To clone thesequence into the vector, both pMEN20 and the amplified DNA fragmentwere digested separately with SalI and NotI restriction enzymes at 37°C. for 2 hours. The digestion products were subject to electrophoresisin a 0.8% agarose gel and visualized by ethidium bromide staining. TheDNA fragments containing the sequence and the linearized plasmid wereexcised and purified by using a QIAQUICK gel extraction kit (Qiagen,Valencia, Calif.). The fragments of interest were ligated at a ratio of3:1 (vector to insert). Ligation reactions using T4 DNA ligase (NewEngland Biolabs, Beverly Mass.) were carried out at 16° C. for 16 hours.The ligated DNAs were transformed into competent cells of the E. colistrain DH5alpha by using the heat shock method. The transformations wereplated on LB plates containing 50 mg/l kanamycin (Sigma Chemical Co. St.Louis Mo.). Individual colonies were grown overnight in five millilitersof LB broth containing 50 mg/l kanamycin at 37° C. Plasmid DNA waspurified by using Qiaquick Mini Prep kits (Qiagen, Valencia Calif.).

Example III Transformation of Agrobacterium with the Expression Vector

After the plasmid vector containing the gene was constructed, the vectorwas used to transform Agrobacterium tumefaciens cells expressing thegene products. The stock of Agrobacterium tumefaciens cells fortransformation were made as described by Nagel et al. (1990) FEMSMicrobiol Letts. 67: 325-328. Agrobacterium strain ABI was grown in 250ml LB medium (Sigma) overnight at 28° C. with shaking until anabsorbance over 1 cm at 600 nm (A₆₀₀) of 0.5-1.0 was reached. Cells wereharvested by centrifugation at 4,000×g for 15 minutes at 4° C. Cellswere then resuspended in 250 μl chilled buffer (1 mM HEPES, pH adjustedto 7.0 with KOH). Cells were centrifuged again as described above andresuspended in 125 μl chilled buffer. Cells were then centrifuged andresuspended two more times in the same HEPES buffer as described aboveat a volume of 100 μl and 750 μl, respectively. Resuspended cells werethen distributed into 40 μl aliquots, quickly frozen in liquid nitrogen,and stored at −80° C.

Agrobacterium cells were transformed with plasmids prepared as describedabove following the protocol described by Nagel et al. (1990) supra. Foreach DNA construct to be transformed, 50-100 ng DNA (generallyresuspended in 10 mM Tris-HCl, 1 mM EDTA, pH 8.0) was mixed with 40 μlof Agrobacterium cells. The DNA/cell mixture was then transferred to achilled cuvette with a 2 mm electrode gap and subject to a 2.5 kV chargedissipated at 25 μF and 200 μF using a Gene Pulser II apparatus(Bio-Rad, Hercules, Calif.). After electroporation, cells wereimmediately resuspended in 1.0 ml LB and allowed to recover withoutantibiotic selection for 2-4 hours at 28° C. in a shaking incubator.After recovery, cells were plated onto selective medium of LB brothcontaining 100 μg/ml spectinomycin (Sigma) and incubated for 24-48 hoursat 28° C. Single colonies were then picked and inoculated in freshmedium. The presence of the plasmid construct was verified by PCRamplification and sequence analysis.

Example IV Transformation of Arabidopsis Plants

After transformation of Agrobacterium tumefaciens with plasmid vectorscontaining the gene, single Agrobacterium colonies were identified,propagated, and used to transform Arabidopsis plants. Briefly, 500 mlcultures of LB medium containing 50 mg/l kanamycin were inoculated withthe colonies and grown at 28° C. with shaking for 2 days until anoptical absorbance at 600 nm wavelength over 1 cm (A₆₀₀) of >2.0 isreached. Cells were then harvested by centrifugation at 4,000×g for 10minutes, and resuspended in infiltration medium (½× Murashige and Skoogsalts (Sigma), 1× Gamborg's B-5 vitamins (Sigma), 5.0% (w/v) sucrose(Sigma), 0.044 μM benzylamino purine (Sigma), 200 μl/l Silwet L-77(Lehle Seeds) until an A₆₀₀ of 0.8 was reached.

Prior to transformation, Arabidopsis thaliana seeds (ecotype Columbia)were sown at a density of about 10 plants per 4″ pot onto Pro-Mix BXpotting medium (Hummert International) covered with fiberglass mesh (18mm×16 mm). Plants were grown under continuous illumination (50-75μE/m²/second) at 22-23° C. with 65-70% relative humidity. After about 4weeks, primary inflorescence stems (bolts) are cut off to encouragegrowth of multiple secondary bolts. After flowering of the maturesecondary bolts, plants were prepared for transformation by removal ofall siliques and opened flowers.

The pots were then immersed upside down in the mixture of Agrobacteriuminfiltration medium as described above for 30 seconds, and placed ontheir sides to allow draining into a 1′×2′ flat surface covered withplastic wrap. After 24 hours, the plastic wrap was removed and pots areturned upright. The immersion procedure was repeated one week later, fora total of two immersions per pot. Seeds were then collected from eachtransformation pot and analyzed following the protocol described below.

Example V Identification of Arabidopsis Primary Transformants

Seeds collected from the transformation pots were sterilized essentiallyas follows. Seeds were dispersed into in a solution containing 0.1%(v/v) Triton X-100 (Sigma) and sterile water and washed by shaking thesuspension for 20 minutes. The wash solution was then drained andreplaced with fresh wash solution to wash the seeds for 20 minutes withshaking. After removal of the ethanol/detergent solution, a solutioncontaining 0.1% (v/v) Triton X-100 and 30% (v/v) bleach (CLOROX; CloroxCorp. Oakland Calif.) was added to the seeds, and the suspension wasshaken for 10 minutes. After removal of the bleach/detergent solution,seeds were then washed five times in sterile distilled water. The seedswere stored in the last wash water at 4° C. for 2 days in the darkbefore being plated onto antibiotic selection medium (1× Murashige andSkoog salts (pH adjusted to 5.7 with 1M KOH), 1× Gamborg's B-5 vitamins,0.9% phytagar (Life Technologies), and 50 mg/l kanamycin). Seeds weregerminated under continuous illumination (50-75 μE/m²/second) at 22-23°C. After 7-10 days of growth under these conditions, kanamycin resistantprimary transformants (T₁ generation) were visible and obtained. Theseseedlings were transferred first to fresh selection plates where theseedlings continued to grow for 3-5 more days, and then to soil (Pro-MixBX potting medium).

Primary transformants were crossed and progeny seeds (T₂) collected;kanamycin resistant seedlings were selected and analyzed. The expressionlevels of the recombinant polynucleotides in the transformants variesfrom about a 5% expression level increase to a least a 100% expressionlevel increase. Similar observations are made with respect topolypeptide level expression.

Example VI Identification of Arabidopsis Plants with TranscriptionFactor Gene Knockouts

The screening of insertion mutagenized Arabidopsis collections for nullmutants in a known target gene was essentially as described in Krysan etal. (1999) Plant Cell 11: 2283-2290. Briefly, gene-specific primers,nested by 5-250 base pairs to each other, were designed from the 5′ and3′ regions of a known target gene. Similarly, nested sets of primerswere also created specific to each of the T-DNA or transposon ends (the“right” and “left” borders). All possible combinations of gene specificand T-DNA/transposon primers were used to detect by PCR an insertionevent within or close to the target gene. The amplified DNA fragmentswere then sequenced, which allows the precise determination of theT-DNA/transposon insertion point relative to the target gene. Insertionevents within the coding or intervening sequence of the genes weredeconvoluted from a pool comprising a plurality of insertion events to asingle unique mutant plant for functional characterization. The methodis described in more detail in Yu and Adam, U.S. application Ser. No.09/177,733 filed Oct. 23, 1998.

Example VII Identification of Modified Phenotypes in Overexpressing orKnockout Plants

Experiments were performed to identify those transformants or knockoutsthat exhibited an improved pathogen tolerance. For such studies, thetransformants were exposed to biotropic fungal pathogens, such asErysiphe orontii, and necrotropic fungal pathogens, such as Fusariumoxysporum. Fusarium oxysporum isolates cause vascular wilts and dampingoff of various annual vegetables, perennials and weeds (Mauch-Mani andSlusarenko (1994) Molec Plant-Microbe Interact. 7: 378-383). ForFusarium oxysporum experiments, plants were grown on Petri dishes andsprayed with a fresh spore suspension of F. oxysporum. The sporesuspension was prepared as follows: a plug of fungal hyphae from a plateculture was placed on a fresh potato dextrose agar plate and allowed tospread for one week. Five ml sterile water was added to the plate,swirled, and pipetted into 50 ml Armstrong Fusarium medium. Spores weregrown overnight in Fusarium medium and then sprayed onto plants using aPreval paint sprayer. Plant tissue was harvested and frozen in liquidnitrogen 48 hours post-infection.

Erysiphe orontii is a causal agent of powdery mildew. For Erysipheorontii experiments, plants were grown approximately four weeks in agreenhouse under 12 hour light (20° C., about 30% relative humidity(rh)). Individual leaves were infected with E. orontii spores frominfected plants using a camel's hair brush, and the plants weretransferred to a Percival growth chamber (20° C., 80% rh.). Plant tissuewas harvested and frozen in liquid nitrogen seven days post-infection.

Botrytis cinerea is a necrotrophic pathogen. Botrytis cinerea was grownon potato dextrose agar under 12 hour light (20° C., about 30% relativehumidity (rh)). A spore culture was made by spreading 10 ml of sterilewater on the fungus plate, swirling and transferring spores to 10 ml ofsterile water. The spore inoculum (approx. 105 spores/ml) was then usedto spray 10 day-old seedlings grown under sterile conditions on MS(minus sucrose) media. Symptoms were evaluated every day up toapproximately 1 week.

Sclerotinia sclerotiorum hyphal cultures were grown in potato dextrosebroth. One gram of hyphae was ground, filtered, spun down andresuspended in sterile water. A 1:10 dilution was used to spray 10day-old seedlings grown aseptically under a 12 hour light/dark regime onMS (minus sucrose) media. Symptoms were evaluated every day up toapproximately 1 week.

Pseudomonas syringae pv maculicola (Psm) strain 4326 and pv maculicolastrain 4326 was inoculated by hand at two doses. Two inoculation dosesallowed the differentiation between plants with enhanced susceptibilityand plants with enhanced resistance to the pathogen. Plants were grownfor three weeks in the greenhouse, then transferred to the growthchamber for the remainder of their growth. Psm ES4326 was handinoculated with 1 ml syringe on three fully-expanded leaves per plant(4½ wk old), using at least nine plants per overexpressing line at twoinoculation doses, OD=0.005 and OD=0.0005. Disease scoring was performedthree post-inoculation by evaluating the plants and leavessimultaneously.

Expression patterns of the pathogen-induced genes (such as defensegenes) was also monitored by microarray experiments. In theseexperiments, cDNAs were generated by PCR and resuspended at a finalconcentration of about 100 ng/μl in 3×SSC or 150 mM Na-phosphate (Eisenand Brown (1999) Methods Enzymol. 303: 179-205). The cDNAs were spottedon microscope glass slides coated with polylysine. The prepared cDNAswere aliquoted into 384 well plates and spotted on the slides using, forexample, an x-y-z gantry (OmniGrid; GeneMachines Menlo Park, Calif.)outfitted with quill type pins (Telechem International, Sunnyvale,Calif.). After spotting, the arrays were cured for a minimum of one weekat room temperature, rehydrated and blocked following the protocol ofEisen and Brown (Eisen and Brown (1999) supra).

Sample total RNA (10 μg) samples were labeled using fluorescent Cy3 andCy5 dyes. Labeled samples were resuspended in 4×SSC/0.03% SDS/4 μgsalmon sperm DNA/2 μg tRNA/50 mM Na-pyrophosphate, heated for 95° C. for2.5 minutes, spun down and placed on the array. The array was coveredwith a glass coverslip and placed in a sealed chamber. The chamber waskept in a water bath at 62° C. overnight. The arrays were washed asdescribed (Eisen and Brown (1999) supra) and scanned on a GeneralScanning 3000 laser scanner. The resulting files were quantified withIMAGENE software (BioDiscovery, Los Angeles Calif.).

Reverse transcriptase PCR or RT-PCR experiments may be performed toidentify those genes induced after exposure to biotropic fungalpathogens, such as Erysiphe orontii, necrotropic fungal pathogens, suchas Fusarium oxysporum, bacteria, viruses and salicylic acid, the latterbeing involved in a nonspecific resistance response in Arabidopsisthaliana. Generally, the gene expression patterns from ground plant leaftissue was examined. RT-PCR was conducted using gene specific primerswithin the coding region for each sequence identified. The primers weredesigned near the 3′ region of each DNA binding sequence initiallyidentified.

Total RNA from ground leaf tissues was isolated using the CTABextraction protocol. Once extracted total RNA was normalized inconcentration across all the tissue types to ensure that the PCRreaction for each tissue received the same amount of cDNA template usingthe 28S band as reference. Poly(A+) RNA was purified using a modifiedprotocol from the Qiagen OLIGOTEX purification kit batch protocol. cDNAwas synthesized using standard protocols. After the first strand cDNAsynthesis, primers for Actin 2 were used to normalize the concentrationof cDNA across the tissue types. Actin 2 is found to be constitutivelyexpressed in fairly equal levels across the tissue types beinginvestigated.

cDNA template was mixed with corresponding primers and Taq DNApolymerase. Each reaction consisted of 0.2 μl cDNA template, 2 μl 10×Tricine buffer, 2 μl 10× Tricine buffer and 16.8 μl water, 0.05 μlPrimer 1, 0.05 μl, Primer 2, 0.3 μl Taq DNA polymerase and 8.6 μl water.

The 96 well plate was covered with microfilm and set in the thermocyclerto start the reaction cycle. A typical reaction cycle consisted of thefollowing steps:

Step 1: 93° C. for 3 minutes;

Step 2: 93° C. for 30 seconds;

Step 3: 65° C. for 1 minute;

Step 4: 72° C. for 2 minutes;

Steps 2, 3 and 4 are repeated for 28 cycles;

Step 5: 72° C. for 5 minutes; and

Step 6. 4° C.

To amplify more products, for example, to identify genes that have verylow expression, additional steps may be performed: The following methodillustrates a method that may be used in this regard. The PCR plate isplaced back in the thermocycler for eight more cycles of steps 2-4.

Step 2. 93° C. for 30 seconds;

Step 3. 65° C. for 1 minute;

Step 4. 72° C. for 2 minutes, repeated for 8 cycles; and

Step 5. 4° C.

Eight microliters of PCR product and 1.5 μl of loading dye are loaded ona 1.2% agarose gel for analysis after 28 cycles and 36 cycles.Expression levels of specific transcripts are considered low if theywere only detectable after 36 cycles of PCR. Expression levels areconsidered medium or high depending on the levels of transcript comparedwith observed transcript levels for an internal control such as actin2.Transcript levels are determined in repeat experiments and compared totranscript levels in control (e.g., non-transformed) plants.

Modified phenotypes observed for particular overexpressor or knockoutplants may include increased or decreased disease tolerance orresistance. For a particular overexpressor that shows a less beneficialcharacteristic such as reduced disease resistance or tolerance, it maybe more useful to select a plant with a decreased expression of theparticular transcription factor. For a particular knockout that shows abeneficial characteristic, such as increased disease resistance ortolerance, it may be more useful to select a plant with an increasedexpression of the particular transcription factor.

The transcription factor sequences of the Sequence Listing, or those inthe present Tables or Figures, and their equivalogs, can be used toprepare transgenic plants and plants with altered traits. The specifictransgenic plants listed below are produced from the sequences of theSequence Listing, as noted. The Sequence Listing and Tables 1, 2, 6 and7 provide exemplary polynucleotide and polypeptide sequences of theinvention.

Example VIII Description and Overexpression of G28 (Polynucleotide andPolypeptide SEQ ID NO: 1 and 2) and Production of Disease Tolerance orResistance in Plants

This example provides experimental evidence for the disease tolerance orresistance controlled by the transcription factor polypeptides andpolypeptides of the invention, including resistance or tolerance tomultiple pathogens provided by G28 and its equivalogs.

Among the goals of these studies was to determine whether altering theexpression of G28 or its equivalogs (including those listed in theSequence Listing) in transgenic plants could confer a significantimprovement in pathogen tolerance or resistance. This may be determinedby empirical observations of plants that overexpressed G28 or equivalogsafter challenge with pathogenic organisms, as compared to control plantssimilarly treated, as well as by gene expression analyses of theseplants for the purpose of demonstrating the expression of direct andindirect pathway targets by G28. These targets generally includespecific plant disease resistance genes, including, by way of examplebut not limitation, genes encoding chitinases, glucanases, enzymes ofphytoalexin biosynthesis, defensins, enzymes of lignin biosynthesis,anti-oxidant activities (e.g., glutathione-S-transferases). The pathwaytargets may be instrumental in a defense response involving localizedprogrammed cell death of infected host cells (the “hypersensitiveresponse”), the accumulation of anti-pathogenic compounds, and cell-wallreinforcement. The hypersensitive response subsequently leads tosystemic induction of defense pathways that prevents further infectionin a systemic acquired resistance (SAR; Dong (1998) Curr. Opin. PlantBiol. 1: 316-323). SAR is typically effective against a wide variety ofpathogen types and can be characterized as an induced broad-spectrumresistance or tolerance.

In a preferred embodiment, overexpression of G28 or an equivalog leadsto SAR, i.e., broad-spectrum resistance or tolerance, by induction ofmultiple direct and indirect pathway targets.

Published Information. Arabidopsis tdr G28 corresponds to AtERF1(GenBank accession number AB008103; Fujimoto et al. (2000) supra). G28appears as gene AT4g17500 in the annotated sequence of Arabidopsischromosome 4 (AL161546.2).

AtERF1 has been shown to have GCC-box binding activity; somedefense-related genes that were induced by ethylene were found tocontain a short cis-acting element known as the GCC-box: AGCCGCC(Ohme-Takagi and Shinshi (1990) supra). Using transient assays inArabidopsis leaves, AtERF1 was found to be able to act as a GCC-boxsequence-specific transactivator (Fujimoto et al. (2000) supra).

As noted above, ATERF1 expression has been described to be induced byethylene (two- to three-fold increase in AtERF1 transcript levels 12hours after ethylene treatment; Fujimoto et al. (2000) supra). In theein2 mutant, the expression of AtERF1 was not induced by ethylene,suggesting that the ethylene induction of AtERF1 is regulated under theethylene signaling pathway (Fujimoto et al. (2000) supra). AtERF1expression was also induced by wounding, but not by other abioticstresses (such as cold, salinity, or drought; Fujimoto et al. (2000)supra).

It has been suggested that AtERFs, in general, may act as transcriptionfactors for stress-responsive genes, and that the GCC-box may act as acis-regulatory element for biotic and abiotic stress signal transductionin addition to its role as an ethylene responsive element (ERE; Fujimotoet al. (2000) supra), but there are no data available on thephysiological functions of AtERF1.

Experimental Observations, Disease Resistance. G28 is expressed athigher levels when wild type Arabidopsis plants are inoculated withErysiphe, Fusarium, or treated with salicylic acid, compared withexpression levels of G28 in control untreated samples.

A full length G28 cDNA under the control of the CaMV 35S promoter wastransformed into wild-type Arabidopsis plants. Twenty independenttransgenic T1 lines were planted and nine of those T1 plants weremonitored for the expression of the transgene by RT-PCR. The threehighest G28 over-expressing lines were carried to the next generationand scored for disease resistance. To ensure that there was noco-suppression in the generation in which the assays were beingperformed, the expression of G28 from the transgene was monitored byRT-PCR. A high level of G28 induction was observed in this generationand it was concluded that there was not a high level of cosuppression.When three 35S::G28 lines, G28-10, -11 and -15, were tested forresistance to E. orontii, B. cinerea, and S. sclerotiorum, all threelines exhibited enhanced resistance. The G28-15 and G28-11 lines behavedsimilarly in all the assays and exhibited phenotypes that were muchstronger than line G28-10 as measured by disease severity ratings. Thiswas consistent with results from B. cinerea and S. sclerotiorum assayson the same plant lines grown and assayed in tissue culture.Importantly, G28 overexpression conferred increased resistance topathogens with very different modes of infection, a surprising result.E. orontii is a biotrophic pathogen whereas the other two arenecrotrophic. Because it is known that different defense-related signaltransduction pathways are activated in response to different pathogentypes (Maleck et al. (1999) Trends Plant Sci. 4: 215-219; Pieterse etal. (1999) Trends Plant Sci. 4: 52-58), these results were unexpectedand suggest that G28 is a central player in activating multipleresistance mechanisms. This is the reason that G28 transgenic plantswere given high priority for further analysis.

As expected for a transcription factor involved in plant defenseresponses, RT-PCR analysis showed that G28 is expressed in a variety ofArabidopsis tissues (predominantly in shoot, root, rosette, cauline, andgerminating seed) and under several disease-related conditions.Importantly, as shown by real-time PCR analysis, G28 appears to beinvolved in defense response pathways, since its transcription wasactivated in response to the defense-related hormones jasmonic acid andsalicylic acid as well as the fungal pathogen Botrytis cinerea. G28 waspreviously shown to be induced by ethylene (Fujimoto et al. (2000)supra) and was confirmed experimentally using real-time PCR. Thepathogenesis related genes PR1 and PDF1.2 were used as controls for thisexperiment.

PR1 is a known marker of systemic acquired resistance and is salicylicacid-inducible, and PDF1.2 is the best-characterized gene that isinduced by jasmonic acid, ethylene and several necrotrophic fungalpathogens (Maleck et al. (1999) supra; Pieterse et al. (1999) supra).PR1 and PDF1.2 induction were consistent with expectations and showed asteady increase following the appropriate treatments. G28 induction bysalicylic acid, 1-aminocyclopropane-1-carboxylic acid (ACC) and jasmonicacid occurred within two hours of treatment and was transient eventhough the treatment continued throughout the experimental time-course.On the other hand, G28 induction by B. cinerea occurred within two hoursof fungal treatment and continued to rise throughout the time-course.Importantly, the marker genes for salicylic acid, jasmonic acid and ETresponses, PR1 and PDF1.2 were found to be constitutively upregulated inthe 35S::G28 transgenic plants, suggesting that these genes could be thedownstream targets for the activity of G28 (a similar constitutiveexpression pattern of PR1 and PDF1.2 was observed following microarrayanalysis of the 35S::G28 transgenics). In fact, PDF1.2 has a GCC-boxelement in its promoter and is therefore potentially a direct target ofG28.

Although G28 transcription was activated in response to ethylene,overexpression of G28 had no effect on the well-studied ethyleneresponse pathway that is involved in a variety of developmentalresponses, including the so-called triple response of seedlings. Thatis, transgenic plants over-expressing G28 exhibited a normal tripleresponse. The latter observation supports the conclusion that G28functions specifically in a defense-response pathway.

Transgenic plants that over-expressed G28 and had enhanced resistance toErysiphe orontii, Sclerotinia sclerotiorum, and Botrytis cinerea areshown. Three independent CAMV 35S promoter::G28 transgenic lines, -15,-10 and -11, were found to be more tolerant to infection with a moderatedose of the fungal pathogen Erysiphe orontii. Erysiphe spores wereobtained from 10 to 14 day old Erysiphe cultures, and inoculations wereperformed by tapping conidia from 1 to 2 heavily infected leaves ontothe mesh cover of a settling tower, brushing the mesh with a camel'shair paint brush to break up the conidial chains, and letting theconidia settle for 10 minutes. Plants were 4 to 4.5 weeks old at thetime of inoculation. The mesh had a pore size of 95 microns; thesettling towers were 28″ high, and were wide enough to fit over a box ofplants (6″×6″ or 6″×8″). Symptoms were evaluated 7-21 dayspost-inoculation.

Enhanced resistance of 35S::G28-15 to the fungal pathogen Sclerotiniasclerotiorum was also observed. Sclerotinia sclerotiorum hyphal cultureswere grown in potato dextrose broth. One gram of hyphae is ground,filtered, spun down and resuspended in sterile water. A 1:10 dilutionwas used to spray four week-old plants grown under a 12 hour light/dark.Two of three independent 35::G28 transgenic lines and infected withSclerotinia sclerotiorum demonstrated a significant reduction in diseaseseverity as compared to wild-type controls similarly infected.

Enhanced resistance of 35S::G28-15 overexpressing plants to the fungalpathogen Botrytis cinerea was also observed. Botrytis cinerea was grownon potato dextrose agar. A spore culture was made by spreading 10 ml ofsterile water on the fungus plate, swirling and transferring spores to10 ml of sterile water. The spore inoculum (10⁵ spores/ml) was used tospray four week-old plants grown under 12 hour light/dark conditions.Two of three independent 35::G28 transgenic lines infected with Botrytiscinerea showed a significant reduction in disease severity as comparedto wild-type controls similarly infected.

G28 overexpression did not seem to have detrimental effects on plantgrowth or vigor, since plants from most of the lines weremorphologically wild-type. In addition, no difference was detectedbetween those lines and the corresponding wild-type controls in all thebiochemical assays that were performed.

Table 7 summarizes subsequent experiments and shows the observed traitand response of transgenic 35S::G28 Arabidopsis plants overexpressingG28 when treated with different plant pathogens over particular timeperiods when inoculated with a plant pathogen (Botrytis, Sclerotinia, orErysiphe). The first column shows the trait or response category to beanalyzed (Response Category); the second column shows the conditionsused for the assay (Assay Type and Medium); the third column shows thepathogen species inoculated onto the plant (Description of Pathogen);the fourth column shows the resulting response of the inoculatedtransgenic plant to the pathogen (Results of Inoculation with Pathogenof Transgenic Arabidopsis Plants). Transgenic Arabidopsis plantsoverexpressing G28 under the control of the CaMV 35S promoter were foundto be more tolerant to pathogens when inoculated with Botrytis,Erysiphe, or Sclerotinia, compared with wild type control plantsimilarly treated. TABLE 7 Results of pathogen challenge on TransgenicArabidopsis plants Assay Type Results of Inoculation with Pathogen andDescription of Transgenic Arabidopsis Medium of Pathogen plantsGrowth/Plate Botrytis 35S::G28: More tolerant Growth/Plate Sclerotinia35S::G28: More tolerant Growth/Plate Botrytis 35S::G28: Repeatexperiment: Individual lines: More tolerant Growth/Plate Sclerotinia35S::G28: Repeat experiment: Individual lines: More tolerant Growth/SoilErysiphe 35S::G28: Less fungal growth on 8 out of 9 plants. Growth/SoilErysiphe 35S::G28: Repeat experiment: Individual lines. Less fungalgrowth on plants from all 3 lines

Transgenic Arabidopsis plants over-expressing SEQ ID NO:1 (plant G28-11)were more tolerant to pathogens and had less fungal growth wheninoculated with Erysiphe orontii compared with wild type control plants(plant Col) similarly treated. Leaves from a transgenic Arabidopsisplant over-expressing SEQ ID NO:1 (leaves G28-11) had less fungal growthwhen inoculated with Erysiphe orontii compared with wild type controlplant (leaves Col) similarly treated.

Transgenic Arabidopsis seedlings over-expressing SEQ ID NO:1 (seedlingsG28-15) were more tolerant to pathogen and had more vigorous growth fivedays following inoculation with Sclerotinia sclerotiorum compared withcontrol seedlings transformed with only the pMEN65 vector (seedlingsPMen65) and similarly inoculated with Sclerotinia. Control seedlingswere engulfed with fungal hyphae whereas the transgenic seedlingscomprising SEQ ID NO: 1 (G28) were tolerant to the presence of hyphaeand continued to grow.

Table 8 shows the increased levels of G28 (SEQ ID NO:1), and G1006 (SEQID NO: 3), and G1004 (SEQ ID NO: 5) in transgenic 35S::G28 Arabidopsisplants overexpressing G28 when treated with different plant pathogens ormethyl jasmonate over particular time periods. The results weredetermined by microarray analysis using a proprietary Arabidopsismicroarray chip. The first column indicates the type of treatment.Columns two through four show the fold increase of the endogenoustranscribed polynucleotide levels compared with endogenous levels of anuntreated control plant sample, untreated control sample fold levelsnormalized to 1.00; the second column shows the fold increase of SEQ IDNO: 1 (G28); the third column shows the fold increase of SEQ ID NO: 3(G1006); the fourth column shows the fold increase of SEQ ID NO: 5(G1004). TABLE 8 Increase of endogenous transcript in 35S::G28Arabidopsis plants overexpressing G28 X-fold increase of endogenoustranscript* G28 G1006 G1004 SEQ ID SEQ ID SEQ ID Treatment NO: 1 NO: 3NO: 5 Botrytis 12 hours 2.61 2.57 3.34 Fusarium 24 hours 3.08 3.45 1.83Fusarium 48 hours 2.33 1.95 1.54 Erysiphe 7 days 2.15 2.78 1.19 Methyljasmonate 24 hours 2.26 1.71 1.03 35S::G28 & Botrytis 2 hours 1.43 1.372.17 35S::G28 & Botrytis 12 hours 9.99 5.55 1.62 35S::G28 & Botrytis 48hours 1.37 1.5 2.44*(control X = 1.00)

Novel Utilities Based on Functional Observations. G28 (AtERF1; SEQ IDNO: 2) was shown to be a key regulator of the plant defense response byoverexpressing AtERF1 in transgenic Arabidopsis plants. In theseexperiments, this gene was shown to provide enhanced resistance todifferent economically important fungal pathogens, including Erysipheorontii, Botrytis cinerea, Fusarium oxysporum and Sclerotiniasclerotiorum. Erysiphe species or so-called powdery mildews are obligatebiotrophs and will only grow on healthy leaves. Botrytis and Sclerotiniaare necrotrophic pathogens that kill host cells to extract nutrients.Fusarium oxysporum, a necrotrophic fungal pathogen, was chosen becauseunlike the aforementioned fungal pathogens that are foliar pathogens, F.oxysporum primarily infects roots. F. oxysporum is a vascular pathogencausing a variety of disease symptoms including chlorosis (yellowing),stunting, wilting, and root rot, head blight of wheat and barley.Fusarium species also synthesize a wide range of phytotoxic compounds,including the sphinganine analogue mycotoxins.

It was surprising that over expression of a single transcription factorled to enhanced resistance against all three of these fungal pathogens.

Therefore, G28 or its equivalogs can be used to manipulate the defenseresponse in order to generate pathogen-resistant plants. Furthermore, aunique motif, Motif Y (SEQ ID NO: 55) was discovered in G28 orthologs inmonocots, but not in dicots, upstream of the conserved AP2 domain ofG28. This motif is likely conserved because it functions in a diseasetolerance-inducing capacity, and thus monocot-derived G28 equivalogsthat comprise Motif Y may be used to enhance disease tolerance inmonocots.

Example IX Identification of Homologous Sequences by Computer HomologySearch

This example describes identification of genes that are orthologous toArabidopsis thaliana transcription factors from a computer homologysearch.

Homologous sequences, including those of paralogs and orthologs fromArabidopsis and other plant species, were identified using databasesequence search tools, such as the Basic Local Alignment Search Tool(BLAST; Altschul et al. (1990) supra; and Altschul et al. (1997) NucleicAcid Res. 25: 3389-3402). The tblastx sequence analysis programs wereemployed using the BLOSUM-62 scoring matrix (Henikoff and Henikoff(1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919). The entire NCBIGenBank database was filtered for sequences from all plants exceptArabidopsis thaliana by selecting all entries in the NCBI GenBankdatabase associated with NCBI taxonomic ID 33090 (Viridiplantae; allplants) and excluding entries associated with taxonomic ID 3701(Arabidopsis thaliana).

These sequences are compared to sequences representing transcriptionfactor genes presented in the Sequence Listing, using the WashingtonUniversity TBLASTX algorithm (version 2.0a19MP) at the default settingsusing gapped alignments with the filter “off”. For each transcriptionfactor gene in the Sequence Listing, individual comparisons were orderedby probability score (P-value), where the score reflects the probabilitythat a particular alignment occurred by chance. For example, a score of3.6e-40 is 3.6×10-40. In addition to P-values, comparisons were alsoscored by percentage identity. Percentage identity reflects the degreeto which two segments of DNA or protein are identical over a particularlength. Examples of sequences so identified are presented in, forexample, Table 2, 6 or 7. Paralogous or orthologous sequences werereadily identified and available in GenBank by GenBank Accession Numberor Test Sequence Annotation (e.g., see Table 6;). The percent sequenceidentity among these sequences can be as low as 47%, or even lowersequence identity.

Candidate paralogous sequences were identified among Arabidopsistranscription factors through alignment, identity, and phylogenicrelationships. G1006 (SEQ ID NO: 4), a paralog of G28, may be found inthe Sequence Listing.

Candidate orthologous sequences were identified from proprietary unigenesets of plant gene sequences in Zea mays, Glycine max and Oryza sativabased on significant homology to Arabidopsis transcription factors.These candidates were reciprocally compared to the set of Arabidopsistranscription factors. If the candidate showed maximal similarity in theprotein domain to the eliciting transcription factor or to a paralog ofthe eliciting transcription factor, then it was considered to be anortholog. Identified non-Arabidopsis sequences that were shown in thismanner to be orthologous to the Arabidopsis sequences are provided in,for example, Tables 2, 6 and 7.

Example X Identification of Orthologous and Paralogous Sequences by PCR

Orthologs to Arabidopsis genes may identified by several methods,including hybridization, amplification, or bioinformatically. Thisexample describes how one may identify equivalogs to the Arabidopsis AP2family transcription factor CBF1 (polynucleotide SEQ ID NO: 45, encodedpolypeptide SEQ ID NO: 46), which confers tolerance to abiotic stresses(Thomashow et al. (2002) U.S. Pat. No. 6,417,428), and an example toconfirm the function of homologous sequences. In this example, orthologsto CBF1 were found in canola (Brassica napus) using polymerase chainreaction (PCR).

Degenerate primers were designed for regions of AP2 binding domain andoutside of the AP2 (carboxyl terminal domain): (SEQ ID NO: 53) Mol 368(reverse) 5′- CAY CCN ATH TAY MGN GGN GT -3′ (SEQ ID NO: 54) Mol 378(forward) 5′- GGN ARN ARC ATh CCY TCN GCC -3′ (Y: C/T, N: A/C/G/T, H:A/C/T, M: A/C, R: A/G)

Primer Mol 368 is in the AP2 binding domain of CBF1 (amino acidsequence: His-Pro-Ile-Tyr-Arg-Gly-Val) while primer Mol 378 is outsidethe AP2 domain (carboxyl terminal domain; amino acid sequence:Met-Ala-Glu-Gly-Met-Leu-Leu-Pro).

The genomic DNA isolated from B. napus was PCR-amplified by using theseprimers following these conditions: an initial denaturation step of 2minutes at 93° C.; 35 cycles of 93° C. for 1 minute, 55° C. for 1minute, and 72° C. for 1 minute; and a final incubation of 7 minutes at72° C. at the end of cycling.

The PCR products were separated by electrophoresis on a 1.2% agarose geland transferred to nylon membrane and hybridized with the AT CBF1 probeprepared from Arabidopsis genomic DNA by PCR amplification. Thehybridized products were visualized by colorimetric detection system(Boehlinger Mannheim) and the corresponding bands from a similar agarosegel were isolated using the Qiagen Extraction Kit (Qiagen, ValenciaCalif.). The DNA fragments were ligated into the TA clone vector fromTOPO TA Cloning Kit (Invitrogen Corporation, Carlsbad Calif.) andtransformed into E. coli strain TOP10 (Invitrogen).

Seven colonies were picked and the inserts were sequenced on an ABI 377machine from both strands of sense and antisense after plasmid DNAisolation. The DNA sequence was edited by sequencer and aligned with theAtCBF1 by GCG software and NCBI blast searching.

The nucleic acid sequence and amino acid sequence of one canola orthologfound in this manner (bnCBF1; polynucleotide SEQ ID NO: 51 andpolypeptide SEQ ID NO: 52) identified by this process is shown in theSequence Listing.

The aligned amino acid sequences show that the bnCBF1 gene has 88%identity with the Arabidopsis sequence in the AP2 domain region and 85%identity with the Arabidopsis sequence outside the AP2 domain whenaligned for two insertion sequences that are outside the AP2 domain.

Similarly, paralogous sequences to Arabidopsis genes, such as CBF1, mayalso be identified.

Two paralogs of CBF1 from Arabidopsis thaliana: CBF2 and CBF3. CBF2 andCBF3 have been cloned and sequenced as described below. The sequences ofthe DNA SEQ ID NO: 47 and 49 and encoded proteins SEQ ID NO: 48 and 50are set forth in the Sequence Listing.

A lambda cDNA library prepared from RNA isolated from Arabidopsisthaliana ecotype Columbia (Lin and Thomashow (1992) Plant Physiol. 99:519-525) was screened for recombinant clones that carried insertsrelated to the CBF1 gene (Stockinger et al. (1997) Proc. Natl. Acad.Sci. USA 94:1035-1040). CBF1 was ³²P-radiolabeled by random priming(Sambrook et al. (1989) supra) and used to screen the library by theplaque-lift technique using standard stringent hybridization and washconditions (Hajela et al. (1990) Plant Physiol. 93:1246-1252; Sambrooket al. (1989) supra) 6×SSPE buffer, 60° C. for hybridization and0.1×SSPE buffer and 60° C. for washes). Twelve positively hybridizingclones were obtained and the DNA sequences of the cDNA inserts weredetermined. The results indicated that the clones fell into threeclasses. One class carried inserts corresponding to CBF1. The two otherclasses carried sequences corresponding to two different homologs ofCBF1, designated CBF2 and CBF3. The nucleic acid sequences and predictedprotein coding sequences for Arabidopsis CBF1, CBF2 and CBF3 are listedin the Sequence Listing (SEQ ID NOs: 45, 47, 49 and SEQ ID NOs: 46, 48,50, respectively). The nucleic acid sequences and predicted proteincoding sequence for Brassica napus CBF ortholog is listed in theSequence Listing (SEQ ID NOs: 51 and 52, respectively).

A comparison of the nucleic acid sequences of Arabidopsis CBF1, CBF2 andCBF3 indicate that they are 83 to 85% identical as shown in Table 9.TABLE 9 Identity comparison of Arabidopsis CBF1, CBF2 and CBF3 Percentidentity^(a) DNA^(b) Polypeptide cbf1/cbf2 85 86 cbf1/cbf3 83 84cbf2/cbf3 84 85^(a)Percent identity was determined using the Clustal algorithm from theMegalign program (DNASTAR, Inc.).^(b)Comparisons of the nucleic acid sequences of the open reading framesare shown.

Similarly, the amino acid sequences of the three CBF polypeptides rangefrom 84 to 86% identity. An alignment of the three amino acidicsequences reveals that most of the differences in amino acid sequenceoccur in the acidic C-terminal half of the polypeptide. This region ofCBF1 serves as an activation domain in both yeast and Arabidopsis (notshown).

Residues 47 to 106 of CBF1 correspond to the AP2 domain of the protein,a DNA binding motif that to date, has only been found in plant proteins.A comparison of the AP2 domains of CBF1, CBF2 and CBF3 indicates thatthere are a few differences in amino acid sequence. These differences inamino acid sequence might have an effect on DNA binding specificity.

Example XI Transformation of Canola with a Plasmid Containing CBF1,CBF2, or CBF3

After identifying homologous genes to CBF1, canola was transformed witha plasmid containing the Arabidopsis CBF1, CBF2, or CBF3 genes clonedinto the vector pGA643 (An (1987) Methods Enzymol. 253: 292). In theseconstructs the CBF genes were expressed constitutively under the CaMV35S promoter. In addition, the CBF1 gene was cloned under the control ofthe Arabidopsis COR15 promoter in the same vector pGA643. Each constructwas transformed into Agrobacterium strain GV3101. TransformedAgrobacteria were grown for 2 days in minimal AB medium containingappropriate antibiotics.

Spring canola (B. napus cv. Westar) was transformed using the protocolof Moloney et al. (1989) Plant Cell Reports 8: 238, with somemodifications as described. Briefly, seeds were sterilized and plated onhalf strength MS medium, containing 1% sucrose. Plates were incubated at24° C. under 60-80 μE/m²s light using a 16 hour light/8 hour darkphotoperiod. Cotyledons from 4-5 day old seedlings were collected, thepetioles cut and dipped into the Agrobacterium solution. The dippedcotyledons were placed on co-cultivation medium at a density of 20cotyledons/plate and incubated as described above for 3 days. Explantswere transferred to the same media, but containing 300 mg/l timentin(SmithKline Beecham, Pa.) and thinned to ten cotyledons/plate. After 7days explants were transferred to Selection/Regeneration medium.Transfers were continued every 2-3 weeks (2 or 3 times) until shoots haddeveloped. Shoots were transferred to Shoot-Elongation medium every 2-3weeks. Healthy looking shoots were transferred to rooting medium. Oncegood roots had developed, the plants were placed into moist pottingsoil.

The transformed plants were then analyzed for the presence of the NPTIIgene/kanamycin resistance by ELISA, using the ELISA NPTII kit from5Prine-3Prime Inc. (Boulder, Colo.). Approximately 70% of the screenedplants were NPTII positive. Only those plants were further analyzed.

From Northern blot analysis of the plants that were transformed with theconstitutively expressing constructs, showed expression of the CBF genesand all CBF genes were capable of inducing the Brassica napuscold-regulated gene BN115 (homolog of the Arabidopsis COR15 gene). Mostof the transgenic plants appear to exhibit a normal growth phenotype. Asexpected, the transgenic plants are more freezing tolerant than thewild-type plants. Using the electrolyte leakage of leaves test, thecontrol showed a 50% leakage at −2° to −3° C. Spring canola transformedwith either CBF1 or CBF2 showed a 50% leakage at −6° to −7° C. Springcanola transformed with CBF3 shows a 50% leakage at about −10° to −15°C. Winter canola transformed with CBF3 may show a 50% leakage at about−160 to −20° C. Furthermore, if the spring or winter canola are coldacclimated the transformed plants may exhibit a further increase infreezing tolerance of at least −2° C.

To test salinity tolerance of the transformed plants, plants werewatered with 150 mM NaCl. Plants overexpressing CBF1, CBF2, or CBF3 grewbetter compared with plants that had not been transformed with CBF1,CBF2, or CBF3.

These results demonstrate that equivalogs of Arabidopsis transcriptionfactors can be identified and shown to confer similar functions innon-Arabidopsis plant species.

Example XII Screen of Plant cDNA Library for Sequence Encoding aTranscription Factor DNA Binding Domain and Demonstration of ProteinTranscription Regulation Activity

The “one-hybrid” strategy (Li and Herskowitz (1993) Science 262:1870-1874) is used to screen for plant cDNA clones encoding apolypeptide comprising a transcription factor DNA binding domain, aconserved domain. In brief, yeast strains are constructed that contain alacZ reporter gene with either wild-type or mutant transcription factorbinding promoter element sequences in place of the normal UAS (upstreamactivator sequence) of the GAL1 promoter. Yeast reporter strains areconstructed that carry transcription factor binding promoter elementsequences as UAS elements are operably linked upstream (5′) of a lacZreporter gene with a minimal GAL1 promoter. The strains are transformedwith a plant expression library that contains random cDNA inserts fusedto the GAL4 activation domain (GAL4-ACT) and screened for blue colonyformation on X-gal-treated filters (X-gal:5-bromo-4-chloro-3-indolyl-β-D-galactoside; Invitrogen Corporation,Carlsbad Calif.). Alternatively, the strains are transformed with a cDNApolynucleotide encoding a known transcription factor DNA binding domainpolypeptide sequence.

Yeast strains carrying these reporter constructs produce low levels ofbeta-galactosidase and form white colonies on filters containing X-gal.The reporter strains carrying wild-type transcription factor bindingpromoter element sequences are transformed with a polynucleotide thatencodes a polypeptide comprising a plant transcription factor DNAbinding domain operably linked to the acidic activator domain of theyeast GAL4 transcription factor, “GAL4-ACT”. The clones that contain apolynucleotide encoding a transcription factor DNA binding domainoperably linked to GAL4-ACT can bind upstream of the lacZ reporter genescarrying the wild-type transcription factor binding promoter elementsequence, activate transcription of the lacZ gene and result in yeastforming blue colonies on X-gal-treated filters.

Upon screening about 2×10⁶ yeast transformants, positive cDNA clones areisolated; i.e., clones that cause yeast strains carrying lacZ reportersoperably linked to wild-type transcription factor binding promoterelements to form blue colonies on X-gal-treated filters. The cDNA clonesdo not cause a yeast strain carrying a mutant type transcription factorbinding promoter elements fused to LacZ to turn blue. Thus, apolynucleotide encoding transcription factor DNA binding domain, aconserved domain, is shown to activate transcription of a gene.

Example XIII Gel Shift Assays

The presence of a transcription factor comprising a DNA binding domainthat binds to a DNA transcription factor binding element is evaluatedusing the following gel shift assay. The transcription factor isrecombinantly expressed and isolated from E. coli or isolated from plantmaterial. Total soluble protein, including transcription factor, (40 ng)is incubated at room temperature in 10 μl of 1× binding buffer (15 mMHEPES (pH 7.9), 1 mM EDTA, 30 mM KCl, 5% glycerol, 5% bovine serumalbumin, 1 mM DTT) plus 50 ng poly(dl-dC):poly(dl-dC; Pharmacia,Piscataway N.J.) with or without 100 ng competitor DNA. After 10 minutesincubation, probe DNA comprising a DNA transcription factor bindingelement (1 ng) that has been ³²P-labeled by end-filling (Sambrook et al.(1989) supra) is added and the mixture incubated for an additional 10minutes. Samples are loaded onto polyacrylamide gels (4% w/v) andfractionated by electrophoresis at 150V for 2 h (Sambrook et al. (1989)supra). The degree of transcription factor-probe DNA binding isvisualized using autoradiography. Probes and competitor DNAs areprepared from oligonucleotide inserts ligated into the BamHI site ofpUC118 (Vieira et al. (1987) Methods Enzymol. 153: 3-11). Orientationand concatenation number of the inserts are determined by dideoxy DNAsequence analysis (Sambrook et al. (1989) supra). Inserts are recoveredafter restriction digestion with EcoRI and HindIII and fractionation onpolyacrylamide gels (12% w/v; Sambrook et al. (1989) supra).

Example XIV Cloning of Transcription Factor Promoters

Promoters are isolated from transcription factor genes that have geneexpression patterns useful for a range of applications, as determined bymethods well known in the art (including transcript profile analysiswith cDNA or oligonucleotide microarrays, Northern blot analysis,semi-quantitative or quantitative RT-PCR). Interesting gene expressionprofiles are revealed by determining transcript abundance for a selectedtranscription factor gene after exposure of plants to a range ofdifferent experimental conditions, and in a range of different tissue ororgan types, or developmental stages. Experimental conditions to whichplants are exposed for this purpose includes cold, heat, drought,osmotic challenge, varied hormone concentrations (ABA, GA, auxin,cytokinin, salicylic acid, brassinosteroid), pathogen and pestchallenge. The tissue types and developmental stages include stem, root,flower, rosette leaves, cauline leaves, siliques, germinating seed, andmeristematic tissue. The set of expression levels provides a patternthat is determined by the regulatory elements of the gene promoter.

Transcription factor promoters for the genes disclosed herein areobtained by cloning 1.5 kb to 2.0 kb of genomic sequence immediatelyupstream of the translation start codon for the coding sequence of theencoded transcription factor protein. This region includes the 5′-UTR ofthe transcription factor gene, which can comprise regulatory elements.The 1.5 kb to 2.0 kb region is cloned through PCR methods, using primersthat include one in the 3′ direction located at the translation startcodon (including appropriate adaptor sequence), and one in the 5′direction located from 1.5 kb to 2.0 kb upstream of the translationstart codon (including appropriate adaptor sequence). The desiredfragments are PCR-amplified from Arabidopsis Col-0 genomic DNA usinghigh-fidelity Taq DNA polymerase to minimize the incorporation of pointmutation(s). The cloning primers incorporate two rare restriction sites,such as Not1 and Sfi1, found at low frequency throughout the Arabidopsisgenome. Additional restriction sites are used in the instances where aNot1 or Sfi1 restriction site is present within the promoter.

The 1.5-2.0 kb fragment upstream from the translation start codon,including the 5′-untranslated region of the transcription factor, iscloned in a binary transformation vector immediately upstream of asuitable reporter gene, or a transactivator gene that is capable ofprogramming expression of a reporter gene in a second gene construct.Reporter genes used include green fluorescent protein (and relatedfluorescent protein color variants), beta-glucuronidase, and luciferase.Suitable transactivator genes include LexA-GAL4, along with atransactivatable reporter in a second binary plasmid (as disclosed inU.S. patent application Ser. No. 09/958,131, incorporated herein byreference). The binary plasmid(s) is transferred into Agrobacterium andthe structure of the plasmid confirmed by PCR. These strains areintroduced into Arabidopsis plants as described in other examples, andgene expression patterns determined according to standard methods knowto one skilled in the art for monitoring GFP fluorescence,beta-glucuronidase activity, or luminescence.

Example XV Transformation of Dicots

Transcription factor sequences listed in the Sequence Listing recombinedinto pMEN20 or pMEN65 expression vectors are transformed into a plantfor the purpose of modifying plant traits. The cloning vector may beintroduced into a variety of cereal plants by means well known in theart such as, for example, direct DNA transfer or Agrobacteriumtumefaciens-mediated transformation. It is now routine to producetransgenic plants using most dicot plants (see Weissbach and Weissbach,(1989) supra; Gelvin et al. (1990) supra; Herrera-Estrella et al. (1983)supra; Bevan (1984) supra; and Klee (1985) supra). Methods for analysisof traits are routine in the art and examples are disclosed above.

Numerous protocols for the transformation of tomato and soy plants havebeen previously described, and are well known in the art. Gruber et al.((1993) in Methods in Plant Molecular Biology and Biotechnology, p.89-119, Glick and Thompson, eds., CRC Press, Inc., Boca Raton) describeseveral expression vectors and culture methods that may be used for cellor tissue transformation and subsequent regeneration. For soybeantransformation, methods are described by Miki et al. (1993) in Methodsin Plant Molecular Biology and Biotechnology, p. 67-88, Glick andThompson, eds., CRC Press, Inc., Boca Raton; and U.S. Pat. No.5,563,055, (Townsend and Thomas), issued Oct. 8, 1996.

There are a substantial number of alternatives to Agrobacterium-mediatedtransformation protocols, other methods for the purpose of transferringexogenous genes into soybeans or tomatoes. One such method ismicroprojectile-mediated transformation, in which DNA on the surface ofmicroprojectile particles is driven into plant tissues with a biolisticdevice (see, for example, Sanford et al., (1987) Part. Sci. Technol.5:27-37; Christou et al. (1992) Plant. J. 2: 275-281; Sanford (1993)Methods Enzymol. 217: 483-509; Klein et al. (1987) Nature 327: 70-73;U.S. Pat. No. 5,015,580 (Christou et al), issued May 14, 1991; and U.S.Pat. No. 5,322,783 (Tomes et al.), issued Jun. 21, 1994.

Alternatively, sonication methods (see, for example, Zhang et al. (1991)Bio/Technology 9: 996-997); direct uptake of DNA into protoplasts usingCaCl₂ precipitation, polyvinyl alcohol or poly-L-ornithine (see, forexample, Hain et al. (1985) Mol. Gen. Genet. 199: 161-168; Draper etal., Plant Cell Physiol. 23: 451-458 (1982)); liposome or spheroplastfusion (see, for example, Deshayes et al. (1985) EMBO J., 4: 2731-2737;Christou et al. (1987) Proc. Natl. Acad. Sci. U.S.A. 84: 3962-3966); andelectroporation of protoplasts and whole cells and tissues (see, forexample, Donn et al. (1990) in Abstracts of VIIth International Congresson Plant Cell and Tissue Culture IAPTC, A2-38: 53; DHalluin et al.(1992) Plant Cell 4: 1495-1505; and Spencer et al. (1994) Plant Mol.Biol. 24: 51-61) have been used to introduce foreign DNA and expressionvectors into plants.

After a plant or plant cell is transformed (and the latter regeneratedinto a plant), the transformed plant may be crossed with itself or aplant from the same line, a non-transformed or wild-type plant, oranother transformed plant from a different transgenic line of plants.Crossing provides the advantages of producing new and often stabletransgenic varieties. Genes and the traits they confer that have beenintroduced into a tomato or soybean line may be moved into distinct lineof plants using traditional backcrossing techniques well known in theart. Transformation of tomato plants may be conducted using theprotocols of Koornneef et al (1986) In Tomato Biotechnology: Alan R.Liss, Inc., 169-178, and in U.S. Pat. No. 6,613,962, the latter methoddescribed in brief here. Eight day old cotyledon explants areprecultured for 24 hours in Petri dishes containing a feeder layer ofPetunia hybrida suspension cells plated on MS medium with 2% (w/v)sucrose and 0.8% agar supplemented with 10 μM α-naphthalene acetic acidand 4.4 μM 6-benzylaminopurine. The explants are then infected with adiluted overnight culture of Agrobacterium tumefaciens containing anexpression vector comprising a polynucleotide of the invention for 5-10minutes, blotted dry on sterile filter paper and cocultured for 48 hourson the original feeder layer plates. Culture conditions are as describedabove. Overnight cultures of Agrobacterium tumefaciens are diluted inliquid MS medium with 2% (w/v/) sucrose, pH 5.7) to an OD₆₀₀ of 0.8.

Following cocultivation, the cotyledon explants are transferred to Petridishes with selective medium comprising MS medium with 4.56 μM zeatin,67.3 μM vancomycin, 418.9 μM cefotaxime and 171.6 μM kanamycin sulfate,and cultured under the culture conditions described above. The explantsare subcultured every three weeks onto fresh medium. Emerging shoots aredissected from the underlying callus and transferred to glass jars withselective medium without zeatin to form roots. The formation of roots ina kanamycin sulphate-containing medium is a positive indication of asuccessful transformation.

Transformation of soybean plants may be conducted using the methodsfound in, for example, U.S. Pat. No. 5,563,055 (Townsend et al., issuedOct. 8, 1996), described in brief here. In this method soybean seed issurface sterilized by exposure to chlorine gas evolved in a glass belljar. Seeds are germinated by plating on 1/10 strength agar solidifiedmedium without plant growth regulators and culturing at 28° C. with a 16hour day length. After three or four days, seed may be prepared forcocultivation. The seedcoat is removed and the elongating radicleremoved 3-4 mm below the cotyledons.

Overnight cultures of Agrobacterium tumefaciens harboring the expressionvector comprising a polynucleotide of the invention are grown to logphase, pooled, and concentrated by centrifugation. Inoculations areconducted in batches such that each plate of seed was treated with anewly resuspended pellet of Agrobacterium. The pellets are resuspendedin 20 ml inoculation medium. The inoculum is poured into a Petri dishcontaining prepared seed and the cotyledonary nodes are macerated with asurgical blade. After 30 minutes the explants are transferred to platesof the same medium that has been solidified. Explants are embedded withthe adaxial side up and level with the surface of the medium andcultured at 22° C. for three days under white fluorescent light. Theseplants may then be regenerated according to methods well established inthe art, such as by moving the explants after three days to a liquidcounter-selection medium (see U.S. Pat. No. 5,563,055).

The explants may then be picked, embedded and cultured in solidifiedselection medium. After one month on selective media transformed tissuebecomes visible as green sectors of regenerating tissue against abackground of bleached, less healthy tissue. Explants with green sectorsare transferred to an elongation medium. Culture is continued on thismedium with transfers to fresh plates every two weeks. When shoots are0.5 cm in length they may be excised at the base and placed in a rootingmedium.

Example XVI Transformation and Increased Disease Resistance in Monocots

Cereal plants such as, but not limited to, corn, wheat, rice, sorghum,or barley, may also be transformed with the present polynucleotidesequences, including monocot or dicot-derived sequences such as thosepresented in Table 2, or AP2 transcription factor genes that encodeMotif Y (SEQ ID NO: 55) or a subsequence substantially identical toMotif Y, cloned into a vector such as pGA643 and containing akanamycin-resistance marker, and expressed constitutively under, forexample, the CaMV 35S or COR15 promoters. pMEN20 or pMEN65 and otherexpression vectors may also be used for the purpose of modifying planttraits. For example, pMEN020 may be modified to replace the NptII codingregion with the BAR gene of Streptomyces hygroscopicus that confersresistance to phosphinothricin. The KpnI and BglII sites of the Bar geneare removed by site-directed mutagenesis with silent codon changes.

The cloning vector may be introduced into a variety of cereal plants bymeans well known in the art including direct DNA transfer orAgrobacterium tumefaciens-mediated transformation. The latter approachmay be accomplished by a variety of means, including, for example, thatof U.S. Pat. No. 5,591,616, in which monocotyledon callus is transformedby contacting dedifferentiating tissue with the Agrobacterium containingthe cloning vector.

The sample tissues are immersed in a suspension of 3×10⁻⁹ cells ofAgrobacterium containing the cloning vector for 3-10 minutes. The callusmaterial is cultured on solid medium at 25° C. in the dark for severaldays. The calli grown on this medium are transferred to Regenerationmedium. Transfers are continued every 2-3 weeks (2 or 3 times) untilshoots develop. Shoots are then transferred to Shoot-Elongation mediumevery 2-3 weeks. Healthy looking shoots are transferred to rootingmedium and after roots have developed, the plants are placed into moistpotting soil.

The transformed plants are then analyzed for the presence of the NPTIIgene/kanamycin resistance by ELISA, using the ELISA NPTII kit from5Prime-3Prime Inc. (Boulder, Colo.).

It is also routine to use other methods to produce transgenic plants ofmost cereal crops (Vasil (1994) Plant Mol. Biol. 25: 925-937) such ascorn, wheat, rice, sorghum (Cassas et al. (1993) Proc. Natl. Acad. Sci.USA 90: 11212-11216, and barley (Wan and Lemeaux (1994) Plant Physiol.104:3748). DNA transfer methods such as the microprojectile method canbe used for corn (Fromm et al. (1990) Bio/Technol. 8: 833-839);Gordon-Kamm et al. (1990) Plant Cell 2: 603-618; Ishida (1990) NatureBiotechnol. 14:745-750), wheat (Vasil et al. (1992) Bio/Technol.10:667-674; Vasil et al. (1993) Bio/Technol. 11:1553-1558; Weeks et al.(1993) Plant Physiol. 102:1077-1084), and rice (Christou (1991)Bio/Technol. 9:957-962; Hiei et al. (1994) Plant J. 6:271-282; Aldemitaand Hodges (1996) Planta 199:612-617; and Hiei et al. (1997) Plant Mol.Biol. 35:205-218). For most cereal plants, embryogenic cells derivedfrom immature scutellum tissues are the preferred cellular targets fortransformation (Hiei et al. (1997) Plant Mol. Biol. 35:205-218; Vasil(1994) Plant Mol. Biol. 25: 925-937). For transforming corn embryogeniccells derived from immature scutellar tissue using microprojectilebombardment, the A188XB73 genotype is the preferred genotype (Fromm etal. (1990) Bio/Technol. 8: 833-839; Gordon-Kamm et al. (1990) Plant Cell2: 603-618). After microprojectile bombardment the tissues are selectedon phosphinothricin to identify the transgenic embryogenic cells(Gordon-Kamm et al. (1990) Plant Cell 2: 603-618). Transgenic plants areregenerated by standard corn regeneration techniques (Fromm et al.(1990) Bio/Technol. 8: 833-839; Gordon-Kamm et al. (1990) Plant Cell 2:603-618).

Northern blot analysis, RT-PCR or microarray analysis of theregenerated, transformed plants may be used to show expression ofG28-equivalog genes that are capable of inducing disease tolerance.Monocot-derived equivalogs of G28 gene contain Motif Y or a subsequencesubstantially identical to Motif Y, and are shown to be expressed andthus may confer disease tolerance.

To verify the ability to confer tolerance, mature plants overexpressinga G28 or G3430 equivalog gene, or alternatively, seedling progeny ofthese plants, may be challenged with any of several disease-causingorganisms, including, for example, the fungal pathogens Botrytis,Fusarium, Erysiphe, and Sclerotinia, or bacterial and other pathogensincluding Pseudomonas syringae, nematodes, mollicutes, parasites, orherbivorous arthropods.

By comparing wild type and transgenic plants similarly treated, thetransgenic plants may be shown to have less fungal growth wheninoculated with several of the fungal pathogens, or fewer adverseeffects from disease caused by Pseudomonas syringae, nematodes,mollicutes, parasites, or herbivorous arthropods.

The transgenic plants may also have greater yield relative to a controlplant when both are faced with the same pathogen challenge. Sincemembers of the G28 clade may be tolerant or resistant to multiplepathogens, plants overexpressing a member of the G3430 subclade of theG28 clade of transcription factor polypeptides may present a smalleryield loss than non-transgenic plants when the two types of plants arefaced with similar challenges from any of a number of pathogens,including fungal pathogens. The symptoms of yield loss may includedefoliation, chlorosis, stunting, lesions, loss of photosynthesis,distortions and necrosis, and thus methods for reducing yield loss mayalleviate some or all of these symptoms.

After a monocot plant or plant cell has been transformed (and the latterregenerated into a plant) and shown to have greater tolerance orresistance to pathogens or greater produce yield relative to a controlplant, the transformed monocot plant may be crossed with itself or aplant from the same line, a non-transformed or wild-type monocot plant,or another transformed monocot plant from a different transgenic line ofplants.

These experiments would demonstrate that members of the G3430 subcladeof transcription factor polypeptides can be identified and shown toconfer disease tolerance or resistance in monocots, including toleranceor resistance to multiple pathogens.

Example XVII Induction of G28 Orthologs in Various Crop Species,Including Monocots

Real time PCR experiments, performed in the manner of Example VII, haveshown that G28 (SEQ ID NO: 2, AtERF1) and its orthologs in Brassicanapus (canola; orthologs Bn bh594074, Bn bh454277), Zea mays (corn;ortholog G3661, SEQ ID NO: 12) and Oryza sativa (rice; ortholog G3430,SEQ ID NO: 10) were induced by the disease-related hormone treatmentsMeJA and SA in the plant species in which they are found, which supportsthe premise that these sequences have conserved function across monocotand dicot lineages.

These experiments have demonstrated that members of the G28 clade oftranscription factor polypeptides and its G3430 subclade have alteredexpression patterns in response to disease-related treatments, and,similar to G28, can confer disease tolerance or resistance, including inmonocots and to multiple pathogens.

All publications and patent applications mentioned in this specificationare herein incorporated by reference to the same extent as if eachindividual publication or patent application was specifically andindividually indicated to be incorporated by reference.

The present invention is not limited by the specific embodimentsdescribed herein. The invention now being fully described, it will beapparent to one of ordinary skill in the art that many changes andmodifications can be made thereto without departing from the spirit orscope of the appended claims. Modifications that become apparent fromthe foregoing description and accompanying figures fall within the scopeof the claims.

1. A transgenic monocot plant having greater tolerance than a controlplant to at least one pathogen, wherein the transgenic monocot plantcomprises a recombinant polynucleotide encoding a polypeptide member ofthe G3430 subclade of transcription factor polypeptides.
 2. Thetransgenic monocot plant of claim 1, wherein the polypeptide membercomprises a Motif Y that is at least 82% identical to SEQ ID NO:
 55. 3.The transgenic monocot plant of claim 2, wherein the recombinantpolynucleotide encodes a polypeptide comprising SEQ ID NO:
 55. 4. Thetransgenic monocot plant of claim 1, wherein the recombinantpolynucleotide hybridizes over its full length to SEQ ID NO: 9 or itscomplement under stringent conditions; and wherein the stringentconditions include two wash steps of 6×SSC at 65° C., each step being10-30 minutes in duration.
 5. The transgenic monocot plant of claim 1,wherein the recombinant polynucleotide is operably linked to at leastone regulatory element capable of regulating expression of therecombinant polynucleotide when the recombinant polynucleotide istransformed into a plant.
 6. The transgenic monocot plant of claim 5,wherein said at least one regulatory element is selected from the groupconsisting of a promoter, a transcription initiation start site, an RNAprocessing signal, a transcription termination site, and apolyadenylation signal.
 7. The transgenic monocot plant of claim 6,wherein the promoter is constitutive, inducible, or tissue-specific. 8.The transgenic monocot plant of claim 1, wherein the recombinantpolynucleotide is incorporated into an expression vector.
 9. Thetransgenic monocot plant of claim 1, wherein the transgenic monocotplant is a plant cell.
 10. The transgenic monocot plant of claim 1,wherein the recombinant polynucleotide encodes a polypeptide comprisingSEQ ID NO:
 10. 11. The transgenic monocot plant of claim 1, wherein theat least one pathogen is at least one fungal pathogen.
 12. Thetransgenic monocot plant of claim 11, wherein the at least one fungalpathogen is selected from the group consisting of Fusarium, Erysiphe,Sclerotinia and Botrytis.
 13. The transgenic monocot plant of claim 1,wherein the recombinant polynucleotide comprises a nucleic acid sequenceselected from the group consisting of SEQ ID NO: 9, SEQ ID NO: 11, SEQID NO: 29, SEQ ID NO: 31, SEQ ID NO: 33, and SEQ ID NO:
 35. 14. Seedproduced from the transgenic monocot plant according to claim
 1. 15. Amethod for producing a transformed monocot plant having greatertolerance or resistance to at least one pathogen than a control plant,said method comprising: (a) providing an expression vector comprising:(i) a polynucleotide sequence encoding a polypeptide comprising a MotifY that is at least 82% identical to SEQ ID NO: 55; and (ii) regulatoryelements flanking the polynucleotide sequence, said regulatory elementsbeing able to control expression of the polynucleotide sequence in atarget monocot plant; and (b) transforming the target monocot plant withthe expression vector to generate a transformed monocot plant that iscapable of expressing the polynucleotide sequence; wherein theexpression of the polynucleotide sequence results in the transformedmonocot plant with greater tolerance or resistance to the at least onepathogen than the control plant.
 16. The method of claim 15, whereinsaid polynucleotide sequence hybridizes to SEQ ID NO: 9 under thestringent conditions of 6×SSC and 65° C.
 17. The method of claim 15,wherein said at least one pathogen is at least one fungal pathogen. 18.The method of claim 17, wherein the at least one fungal pathogen isselected from the group consisting of Botrytis, Fusarium, Erysiphe, andSclerotinia.
 19. The method of claim 15, the method steps furthercomprising: (c) selfing or crossing the transformed monocot plant withitself or another monocot plant, respectively, to produce seed; and (d)growing a progeny monocot plant from the seed; wherein the progenymonocot plant has greater tolerance or resistance to the at least onepathogen than the control plant.
 20. A method for reducing yield lossdue to a plant disease in a monocot plant, the method comprising: (a)providing an expression vector comprising: (i) a polynucleotide sequenceencoding a polypeptide comprising a Motif Y that is at least 82%identical to SEQ ID NO: 55; and (ii) regulatory elements flanking thepolynucleotide sequence, said regulatory elements being able to controlexpression of the polynucleotide sequence in a target monocot plant; and(b) transforming the target monocot plant with the expression vector togenerate a transformed monocot plant that is capable of expressing thepolynucleotide sequence; and (c) growing the transformed monocot plant;wherein the expression of the polynucleotide sequence results in thetransformed monocot plant having reduced yield loss due to the plantdisease when the transformed monocot plant is contacted by at least onepathogen.
 21. The method of claim 20, wherein said plant disease iscaused by at least one pathogen.
 22. The method of claim 21, whereinsaid at least one pathogen is at least one fungal pathogen.
 23. Themethod of claim 22, wherein the at least one fungal pathogen is selectedfrom the group consisting of Botrytis, Fusarium, Erysiphe, andSclerotinia.
 24. The method of claim 20, wherein the method alleviatesone or more disease symptoms selected from the group consisting ofdefoliation, chlorosis, stunting, lesions, loss of photosynthesis,distortions and necrosis.